linnarsson-lab / loompy

Python implementation of the Loom file format - http://loompy.org
BSD 2-Clause "Simplified" License
137 stars 36 forks source link

Error decoding run_info.json #140

Open MagpiePKU opened 3 years ago

MagpiePKU commented 3 years ago

Hi,

Thanks for creating this very useful tool. We ran loompy without error previously but recently (after a few other package installations) we got a problem in generating new loom files using loompy fromfq:

Log: Loompy v3.0.6 by Linnarsson Lab 🌸 (http://linnarssonlab.org & http://loompy.org)

2020-11-13 16:09:33,094 - INFO - Using 24 threads. 2020-11-13 16:09:33,117 - INFO - kallisto bus -i /gpfs/genomedb/kallisto/loompy/human_GRCh38_gencode.v31.600/gencode.v31.fragments.idx -o /tmp/tmp5iro5obk -x 10xv3 -t 24 /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S5_L003_R1_001.fastq.gz /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S5_L003_R2_001.fastq.gz /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S6_L003_R1_001.fastq.gz /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S6_L003_R2_001.fastq.gz /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S7_L003_R1_001.fastq.gz /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S7_L003_R2_001.fastq.gz 2020-11-13 16:09:33,328 - INFO - [index] k-mer length: 31 2020-11-13 16:09:33,328 - INFO - [index] number of targets: 845,338 2020-11-13 16:09:33,328 - INFO - [index] number of k-mers: 271,648,279 2020-11-13 16:10:03,506 - INFO - [index] number of equivalence classes: 4,776,424 2020-11-13 16:10:18,297 - INFO - [quant] will process sample 1: /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S5_L003_R1_001.fastq.gz 2020-11-13 16:10:18,299 - INFO - /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S5_L003_R2_001.fastq.gz 2020-11-13 16:10:18,299 - INFO - [quant] will process sample 2: /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S6_L003_R1_001.fastq.gz 2020-11-13 16:10:18,300 - INFO - /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S6_L003_R2_001.fastq.gz 2020-11-13 16:10:18,300 - INFO - [quant] will process sample 3: /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S7_L003_R1_001.fastq.gz 2020-11-13 16:10:18,300 - INFO - /gpfs/datatransfer/c03b01n04/sdb1-20201110_17/LBFC20201489/201107_A00682_0472_BHKY5KDSXY/20044378S4-10XSC3_S7_L003_R2_001.fastq.gz 2020-11-13 17:09:53,297 - INFO - [quant] finding pseudoalignments for the reads ... done 2020-11-13 17:09:53,300 - INFO - [quant] processed 946,138,749 reads, 683,048,253 reads pseudoaligned 2020-11-13 17:09:58,179 - ERROR - Error decoding run_info.json: Expecting value: line 1 column 1 (char 0) 2020-11-13 17:09:58,180 - INFO - Loading gene metadata 2020-11-13 17:09:58,592 - INFO - Loading fragments-to-gene mappings 2020-11-13 17:09:59,165 - INFO - Indexing genes 2020-11-13 17:09:59,668 - INFO - Loading equivalence classes 2020-11-13 17:09:59,668 - INFO - Mapping equivalence classes to genes 2020-11-13 17:09:59,668 - INFO - Loading fragment IDs 2020-11-13 17:09:59,669 - INFO - Loading BUS records

Err: Traceback (most recent call last): File "/gpfs/bin/anaconda3-gpu/bin/loompy", line 8, in sys.exit(cli()) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/loompy/commands.py", line 34, in fromfq create_from_fastq(loomfile, sampleid, list(fastqs), indexdir, metadatafile, threads) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/loompy/bus_file.py", line 429, in create_from_fastq bus = BusFile( File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/loompy/bus_file.py", line 239, in init self.bus_gene = np.array([self.gene_for_ec[x] for x in self.bus["equivalence_class"]], dtype=np.int32) File "/gpfs/bin/anaconda3-gpu/lib/python3.8/site-packages/loompy/bus_file.py", line 239, in self.bus_gene = np.array([self.gene_for_ec[x] for x in self.bus["equivalence_class"]], dtype=np.int32) KeyError: 658005

We wonder how could we fix it.

Thanks a lot , Yi

slinnarsson commented 3 years ago

Hi! I've seen this occasionally and it might be related to kallisto. Could you try updating kallisto? Alternatively, there's a new tool called "kb" from Pachter lab which is essentially equivalent to loompy fromfq (but faster, since it's C++) and generates loom files.

josinejansen commented 3 years ago

Hi, I am trying to create a loom file so I can continue with the ScVelo package, but right now I am stumbling upon this error:

"Manifest file 'manifest.json' was missing from index at 'GBM020420-TIL_HHT_S6_L002_R2_001.fastq.gz'"

This is my code: loompy.create_from_fastq('myloombasedon_fq', 'human_GRCh38_gencode.v31', adata.var, 'GBM-TIL_HHT_S6_L002_R2_001.fastq.gz', 'GBM-TIL_HHT_S6_L002_R1_001.fastq.gz',GBM-BL_HHT_S6_L002_R2_001.fastq.gz', 'GBM-BL_HHT_S6_L002_R1_001.fastq.gz' )

I downloaded the fasq.gz files from the cell ranger output link I received, but the directory does not contain any manifest.json file as far as I am aware of.

Only these files I can directly download:

image

Sample 1 (TUMOR)

image

Sample 2 (BLOOD)

image

What am I doing wrong? I have installed loompy correctly I think.

My second (and last) question is: for the metadata files, can I just enter the Anndata.var data layers? I don't really understand how one metadata column must be named 'name' and another one 'technology' and a 3rd one 'targetnumcells'. Or does it mean I have to change the adata.var and add 3 columns?

Right now my adata.var looks like this:

image

Using the equivalent to loompy Kb doesn't seem to be an option for me as access to the tutorial is denied (I am very dependent on tutorials!)

image

My version of python is 3.7.6.

In the fastq files from the tutorial (pbmc_1k_v3_fastqs.tar) I also don't see a .json file.

Your help would be greatly appreciated!

Josine