maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 40 forks source link

Scanpy cbBuild Assertion Error #39

Closed apblair closed 5 years ago

apblair commented 5 years ago

Hi Max,

I'm receiving an assertion error when attempting to build a browser. I didn't receive errors when I generated the cellbrowser.conf file using a Scanpy object. I also double checked the Scanpy object and the cluster metadata and coordinates are present.

Here is an example of the command and error message I am receiving:

./cbBuild -i ../../single_cell/chi/cellBrowserOut/combined_EF13W3D/cellbrowser.conf -o CBOUT

INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/summary.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/methods.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/downloads.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/thumb.png does not exist INFO:root:Getting md5 of /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv md5sum: /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv: No such file or directory Traceback (most recent call last): File "./cbBuild", line 9, in cellbrowser.convertAndCopyCli() File "./cbPyLib/cellbrowser.py", line 2393, in convertAndCopyCli convertAndCopy(confFnames, outDir, port) File "./cbPyLib/cellbrowser.py", line 2364, in convertAndCopy convertDataset(inConf, outConf, datasetDir) File "./cbPyLib/cellbrowser.py", line 2120, in convertDataset sampleNames, needFilterMatrix, outMeta = convertMeta(inConf, outConf, datasetDir) File "./cbPyLib/cellbrowser.py", line 1954, in convertMeta outConf["fileVersions"]["inMeta"] = getFileVersion(metaFname) File "./cbPyLib/cellbrowser.py", line 1940, in getFileVersion hexHash = md5ForFile(fname).decode("ascii") File "./cbPyLib/cellbrowser.py", line 2035, in md5ForFile md5 = getMd5Using("md5sum", fname).split()[0] File "./cbPyLib/cellbrowser.py", line 2028, in getMd5Using assert(err==0) AssertionError

Thanks again for your help! :)

maximilianh commented 5 years ago

Does /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv exist?

what does "grep cell_to_cluster.tsv cellbrowser.conf" say?

From which directory did you run cbBuild?

On Wed, Oct 3, 2018 at 7:45 AM Andrew Blair notifications@github.com wrote:

Hi Max,

I'm receiving an assertion error when attempting to build a browser. I didn't receive errors when I generated the cellbrowser.conf file using a Scanpy object. I also double checked the Scanpy object and the cluster metadata and coordinates are present.

Here is an example of the command and error message I am receiving:

./cbBuild -i ../../single_cell/chi/cellBrowserOut/combined_EF13W3D/cellbrowser.conf -o CBOUT

INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/summary.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/methods.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/downloads.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/thumb.png does not exist INFO:root:Getting md5 of /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv md5sum: /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv: No such file or directory Traceback (most recent call last): File "./cbBuild", line 9, in cellbrowser.convertAndCopyCli() File "./cbPyLib/cellbrowser.py", line 2393, in convertAndCopyCli convertAndCopy(confFnames, outDir, port) File "./cbPyLib/cellbrowser.py", line 2364, in convertAndCopy convertDataset(inConf, outConf, datasetDir) File "./cbPyLib/cellbrowser.py", line 2120, in convertDataset sampleNames, needFilterMatrix, outMeta = convertMeta(inConf, outConf, datasetDir) File "./cbPyLib/cellbrowser.py", line 1954, in convertMeta outConf["fileVersions"]["inMeta"] = getFileVersion(metaFname) File "./cbPyLib/cellbrowser.py", line 1940, in getFileVersion hexHash = md5ForFile(fname).decode("ascii") File "./cbPyLib/cellbrowser.py", line 2035, in md5ForFile md5 = getMd5Using("md5sum", fname).split()[0] File "./cbPyLib/cellbrowser.py", line 2028, in getMd5Using assert(err==0) AssertionError

Thanks again for your help! :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/39, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TcNeFjVWID2vbcizQTDdGAKfRIs7ks5uhE7rgaJpZM4XFVoH .

apblair commented 5 years ago

The cell_to_cluster.tsv does not exist.

$ grep cell_to_cluster.tsv cellbrowser.conf

meta='cell_to_cluster.tsv'

I ran cbBuild in '/soe/apblair/sysbio_apblair/cellBrowser/src'

maximilianh commented 5 years ago

Was this cellbrowser.conf generated by cbScanpy? Or did you make it yourself? Could you post the cellbrowser.conf file here (if you want to make it shorter, you can remove any comment lines)

On Wed, Oct 3, 2018 at 7:22 PM Andrew Blair notifications@github.com wrote:

The cell_to_cluster.tsv does not exist.

$ grep cell_to_cluster.tsv cellbrowser.conf

meta='cell_to_cluster.tsv'

I ran cbBuild in '/soe/apblair/sysbio_apblair/cellBrowser/src'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/39#issuecomment-426723295, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TfBjzCPC90ml5Um2SwAJ0FxlK8bWks5uhPJhgaJpZM4XFVoH .

apblair commented 5 years ago

I generated the cellbrowser.conf with cellbrowser.scanpyToTsv().

Do you mean to ask if I used cbScanpy to generate the scanpy object and then the cellbrowser.conf? If so then no, I made the scanpy object myself.

Here is the cellbrowser.conf file:

name='combined_EF13W3D' shortLabel='combined_EF13W3D' exprMatrix='exprMatrix.tsv.gz'

tags = ["10x", 'smartseq2']

meta='cell_to_cluster.tsv' geneIdType='symbols' clusterField='Louvain CLuster' labelField='Louvain CLuster' enumFields=['Louvain CLuster'] markers = [{"file": "markers.tsv", "shortLabel":"Cluster Markers"}] coords=[{'file': 'tsne_coords.tsv', 'shortLabel': 'T-SNE'}, {'file': 'umap_coords.tsv', 'shortLabel': 'UMAP'}] radius=5 alpha=0.6

maximilianh commented 5 years ago

If you used cbScanpy, then there really should be a file cell_to_cluster.tsv. Without this file it won't work, as we really need the cluster information. I wonder if this could be some sort of path problem... can you see a cell_to_cluster.tsv file anywhere? It should be in the same directory as where you ran cbScanpy. If it's not there, then cbScanpy probably aborted or had some other problem...

I'm running cbScanpy now on a sample dataset to check if I can reproduce this.

apblair commented 5 years ago

None of the samples have a cell_to_cluster.tsv file generated. I didn't receive an error message when using cellbrowser.scanpyToTsv() though.

Here's my code snippet of cellbrowser.scanpyToTsv() :

Working in cellBrowser/src/cbPyLib for gene_matrices in glob.glob('../../../single_cell/chi/Data/version_0.7/in_vivo///gene_matrices/clustered/*.h5ad'): sample_dir = cellBrowserOutPath + gene_matrices.split('/')[-4]

Create a directory to store the sample's cellbrowser.conf file

if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)
# Generate the cellbrowswer.conf file
# cellbrowser.scanpyToTsv(anndata, outputDirectory, datasetName)
cellbrowser.scanpyToTsv(sc.read(gene_matrices), 'soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/'+gene_matrices.split('/')[-4], gene_matrices.split('/')[-4])

I also checked my own code snippet and I was able to generate a cell_to_cluster.tsv file using the scanpy object I generated in my pipeline: adata_df = pd.DataFrame(adata.obs['louvain']) adata_df.index.names = ['Cells'] adata_df.to_csv('cell_to_cluster.tsv', sep='\t')

apblair commented 5 years ago

When I run cellbrowser.scanpyToTsv() in:

'/soe/apblair/sysbio_apblair/cellBrowser/src/cbPyLib'

with the command:

cellbrowser.scanpyToTsv(sc.read('../../../single_cell/chi/Data/version_0.7/in_vivo/merged_spatiotemporal_analysis/combined_EF13W3D/gene_matrices/clustered/combined_EF13W3D_filtered_clustered.h5ad'), '/soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D', 'combined_EF13W3D')

I get the following output: Writing matrix to /soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/exprMatrix.tsv Writing T-SNE coords to /soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/tsne_coords.tsv Writing UMAP coords to /soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/umap_coords.tsv Couldnt find ForceAtlas2 coordinates Couldnt find PAGA+ForceAtlas2 coordinates Couldnt find PAGA+UMAP coordinates Couldnt find PHATE coordinates Wrote /soe/apblair/sysbio_apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cellbrowser.conf

After using my snippet of code to write the cell_to_cluster.tsv file to the 'combined_EF13W3D' directory, I was able to run cbBuild with the following command:

./cbBuild -i ../../single_cell/chi/cellBrowserOut/combined_EF13W3D/cellbrowser.conf -o ~/.html

However, it appears there is a column heading error:

INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/summary.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/methods.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/downloads.html does not exist INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/thumb.png does not exist INFO:root:Getting md5 of /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv INFO:root:Creating /soe/apblair/.html/combined_EF13W3D/metaFields INFO:root:Checking and reordering meta data to /soe/apblair/.html/combined_EF13W3D/meta.tsv INFO:root:Reading sample names from /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv INFO:root:Reading headers of file /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/exprMatrix.tsv.gz INFO:root:Data contains 23594 samples/cells INFO:root:Converting to numbers and compressing meta data fields INFO:root:Meta data field index 0: 'Cells' INFO:root:Type: uniqueString, 23594 different values INFO:root:Meta data field index 1: 'louvain' INFO:root:Number of values per decile-bin: [8305, 5255, 2768, 1568, 2230, 1279, 998, 633, 483, 75] INFO:root:Type: int, 23 different values INFO:root:Indexing meta file /soe/apblair/.html/combined_EF13W3D/meta.tsv to /soe/apblair/.html/combined_EF13W3D/meta.index INFO:root:Kept 23594 cells present in both meta data file and expression matrix INFO:root:Getting md5 of /soe/apblair/.html/combined_EF13W3D/meta.tsv INFO:root:Determining if /soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz needs to be created INFO:root:/soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz does not exist. INFO:root:Getting md5 of /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/exprMatrix.tsv.gz INFO:root:Copying/compressing /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/exprMatrix.tsv.gz to /soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz INFO:root:converting /soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz to /soe/apblair/.html/combined_EF13W3D/exprMatrix.bin and writing index to /soe/apblair/.html/combined_EF13W3D/exprMatrix.json INFO:root:Compressing gene expression vectors... INFO:root:Auto-detecting number type of /soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz INFO:root:Numbers in matrix are of type 'float' INFO:root:Wrote expression values for 1000 genes INFO:root:Wrote expression values for 2000 genes INFO:root:Wrote expression values for 3000 genes INFO:root:Wrote expression values for 4000 genes INFO:root:Wrote expression values for 5000 genes INFO:root:Wrote expression values for 6000 genes INFO:root:Wrote expression values for 7000 genes INFO:root:Wrote expression values for 8000 genes INFO:root:Wrote expression values for 9000 genes INFO:root:Wrote expression values for 10000 genes INFO:root:Wrote expression values for 11000 genes INFO:root:Wrote expression values for 12000 genes INFO:root:Wrote expression values for 13000 genes INFO:root:Getting md5 of /soe/apblair/.html/combined_EF13W3D/exprMatrix.tsv.gz INFO:root:Wrote /soe/apblair/.html/combined_EF13W3D/cellbrowser.json.bak INFO:root:Wrote /soe/apblair/.html/combined_EF13W3D/cellbrowser.json.bak INFO:root:Wrote /soe/apblair/.html/combined_EF13W3D/dataset.json INFO:root:Wrote /soe/apblair/.html/combined_EF13W3D/dataset.json INFO:root:Parsing column Louvain CLuster from /soe/apblair/.html/combined_EF13W3D/meta.tsv Traceback (most recent call last): File "./cbBuild", line 9, in cellbrowser.convertAndCopyCli() File "./cbPyLib/cellbrowser.py", line 2405, in convertAndCopyCli convertAndCopy(confFnames, outDir, port) File "./cbPyLib/cellbrowser.py", line 2376, in convertAndCopy convertDataset(inConf, outConf, datasetDir) File "./cbPyLib/cellbrowser.py", line 2149, in convertDataset convertCoords(inConf, outConf, sampleNames, outMeta, datasetDir) File "./cbPyLib/cellbrowser.py", line 1866, in convertCoords labelVec, labelVals = parseTsvColumn(outMeta, clusterLabelField) File "./cbPyLib/cellbrowser.py", line 1659, in parseTsvColumn vals = parseOneColumn(fname, colName) File "./cbPyLib/cellbrowser.py", line 223, in parseOneColumn colIdx = headers.index(colName) ValueError: 'Louvain CLuster' is not in list

After changing my 'cell_to_cluster.tsv' to the appropriate column heading ('Louvain CLuster') I was successfully able to build a cell viewer session.

maximilianh commented 5 years ago

Oops, very sorry for this stupid typo. Fixed now. Also fixed the wrong filename issue, it's meta.tsv everywhere now and not cluster_to_cell.tsv