linnarsson-lab / loompy

Python implementation of the Loom file format - http://loompy.org
BSD 2-Clause "Simplified" License
139 stars 37 forks source link

Saving `IndexError: list index out of range` #155

Open abearab opened 3 years ago

abearab commented 3 years ago

Here are my command and log contents running loompy fromfq. It's stopped and I have no idea how to deal with that. I appreciate your help in this matter.

loompy fromfq sample.loom sample human_GRCh38_gencode.v38 metadata.tab sample_L002_R1_001.fastq.gz sample_S7_L002_R2_001.fastq.gz
Loompy v3.0.6 by Linnarsson Lab 🌸 (http://linnarssonlab.org & http://loompy.org)

2021-06-07 10:31:50,815 - INFO - Using 1 threads.
2021-06-07 10:31:50,815 - INFO - kallisto bus -i human_GRCh38_gencode.v38/gencode.fragments.idx -o /tmp/tmp3dtetu4d -x 10xv3 -t 1 sample_L002_R1_001.fastq.gz sample_L002_R2_001.fastq.gz
2021-06-07 10:31:50,855 - INFO - [index] k-mer length: 31
2021-06-07 10:31:50,855 - INFO - [index] number of targets: 860,619
2021-06-07 10:31:50,856 - INFO - [index] number of k-mers: 273,153,399
2021-06-07 10:32:27,658 - INFO - [index] number of equivalence classes: 4,822,434
2021-06-07 10:32:49,516 - INFO - [quant] will process sample 1: sample_L002_R1_001.fastq.gz
2021-06-07 10:32:49,516 - INFO -                                sample_S7_L002_R2_001.fastq.gz
2021-06-07 12:13:05,754 - INFO - [quant] finding pseudoalignments for the reads ... done
2021-06-07 12:13:05,779 - INFO - [quant] processed 350,060,828 reads, 229,943,776 reads pseudoaligned
2021-06-07 12:13:24,766 - INFO - Loading gene metadata
2021-06-07 12:13:25,307 - INFO - Loading fragments-to-gene mappings
2021-06-07 12:13:26,077 - INFO - Indexing genes
2021-06-07 12:13:26,631 - INFO - Loading equivalence classes
2021-06-07 12:14:00,252 - INFO - Mapping equivalence classes to genes
2021-06-07 12:14:13,929 - INFO - Loading fragment IDs
2021-06-07 12:14:14,299 - INFO - Loading BUS records
2021-06-07 12:21:11,401 - INFO - Sorting cell IDs
2021-06-07 12:21:29,471 - INFO - Found 229,943,776 records for 60,708 genes and 6,468,237 uncorrected cell barcodes.
2021-06-07 12:21:29,471 - INFO - Correcting cell barcodes
2021-06-07 12:30:57,767 - INFO - Found 2,067,564 corrected cell barcodes.
2021-06-07 12:30:57,767 - INFO - Removing redundant reads using UMIs
2021-06-07 12:37:17,745 - INFO - 32% sequencing saturation.
2021-06-07 12:37:17,745 - INFO - Counting pseudoalignments for main matrix
2021-06-07 12:38:46,650 - INFO - Found 59,679,581 UMIs.
2021-06-07 12:38:47,780 - INFO - Counting pseudoalignments for layer 'unspliced'
2021-06-07 12:46:15,506 - INFO - Found 18,861,577 UMIs.
2021-06-07 12:46:16,372 - INFO - Counting pseudoalignments for layer 'spliced'
2021-06-07 12:58:31,854 - INFO - Found 47,545,531 UMIs.
2021-06-07 12:58:31,854 - INFO - Calling cells
2021-06-07 13:05:17,823 - INFO - Found 13019 valid cells and ~4584 ambient UMIs.
2021-06-07 13:05:17,823 - INFO - Creating loom file 'sample.loom'
2021-06-07 13:05:17,823 - INFO - Saving
Traceback (most recent call last):
  File "/home/labadmin/anaconda3/envs/alignment/bin/loompy", line 10, in <module>
    sys.exit(cli())
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/loompy/commands.py", line 34, in fromfq
    create_from_fastq(loomfile, sampleid, list(fastqs), indexdir, metadatafile, threads)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/loompy/bus_file.py", line 455, in create_from_fastq
    bus.save(out_file, sample_id, samples_metadata_file)
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/loompy/bus_file.py", line 328, in save
    row_attrs[attr] = [v[i] for v in self.genes.values()]
  File "/home/labadmin/anaconda3/envs/alignment/lib/python3.9/site-packages/loompy/bus_file.py", line 328, in <listcomp>
    row_attrs[attr] = [v[i] for v in self.genes.values()]
IndexError: list index out of range
SinhaI commented 3 years ago

Hi, I am geeting same error when running loompy fromfq commnad. I am running it with hg19 genome index which I have created using build_index.ipynb. Please see below the command and log contents. Thanks for your help.

loompy fromfq JMJ9J11.loom JMJnonInf hg19 metadata.tab JMJ9_R1.fastq.gz JMJ9_R2.fastq.gz JMJ11_R1.fastq.gz JMJ11_R2.fastq.gz


/home/indranil/.local/lib/python3.9/site-packages/numba-0.53.1-py3.9-linux-x86_64.egg/numba/np/ufunc/parallel.py:365: NumbaWarning: The TBB threading layer requires TBB
 version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 6103. The TBB threading layer is disabled.
  warnings.warn(problem)
Loompy v3.0.6 by Linnarsson Lab 🌸 (http://linnarssonlab.org & http://loompy.org)

2021-09-17 20:38:30,549 - INFO - Using 20 threads.
2021-09-17 20:38:30,893 - INFO - kallisto bus -i hg19/gencode.fragments.idx -o /scratch/22224580/tmprbdu15du -x 10xv2 -t 20 JMJ9_R1.fastq.gz JMJ9_R2.fastq.gz JMJ11_R1.f
astq.gz JMJ11_R2.fastq.gz
2021-09-17 20:38:31,249 - INFO - [index] k-mer length: 31
2021-09-17 20:38:31,250 - INFO - [index] number of targets: 859,837
2021-09-17 20:38:31,250 - INFO - [index] number of k-mers: 272,747,089
2021-09-17 20:39:06,705 - INFO - [index] number of equivalence classes: 4,812,424
2021-09-17 20:39:22,967 - INFO - [quant] will process sample 1: JMJ9_R1.fastq.gz
2021-09-17 20:39:22,967 - INFO -                                JMJ9_R2.fastq.gz
2021-09-17 20:39:22,967 - INFO - [quant] will process sample 2: JMJ11_R1.fastq.gz
2021-09-17 20:39:22,967 - INFO -                                JMJ11_R2.fastq.gz
2021-09-17 20:56:14,197 - INFO - [quant] finding pseudoalignments for the reads ... done
2021-09-17 20:56:14,198 - INFO - [quant] processed 301,065,488 reads, 240,926,932 reads pseudoaligned
2021-09-17 20:56:25,693 - INFO - Loading gene metadata
2021-09-17 20:56:26,672 - INFO - Loading fragments-to-gene mappings
2021-09-17 20:56:32,105 - INFO - Indexing genes
2021-09-17 20:56:32,692 - INFO - Loading equivalence classes
2021-09-17 20:57:09,930 - INFO - Mapping equivalence classes to genes
2021-09-17 20:57:24,476 - INFO - Loading fragment IDs
2021-09-17 20:57:24,948 - INFO - Loading BUS records
2021-09-17 21:04:08,597 - INFO - Sorting cell IDs
2021-09-17 21:04:26,744 - INFO - Found 240,926,932 records for 62,446 genes and 1,356,852 uncorrected cell barcodes.
2021-09-17 21:04:26,744 - INFO - Correcting cell barcodes
2021-09-17 21:08:37,627 - INFO - Found 2 corrected cell barcodes.
2021-09-17 21:08:37,627 - INFO - Removing redundant reads using UMIs
2021-09-17 21:13:46,120 - INFO - 99% sequencing saturation.
2021-09-17 21:13:46,122 - INFO - Counting pseudoalignments for main matrix
2021-09-17 21:13:46,234 - INFO - Found 19 UMIs.
2021-09-17 21:13:48,344 - INFO - Counting pseudoalignments for layer 'unspliced'
2021-09-17 21:14:55,086 - INFO - Found 13 UMIs.
2021-09-17 21:14:56,146 - INFO - Counting pseudoalignments for layer 'spliced'
2021-09-17 21:15:44,395 - INFO - Found 6 UMIs.
2021-09-17 21:15:44,395 - INFO - Calling cells
2021-09-17 21:15:44,415 - WARNING - /sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3419: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,

2021-09-17 21:15:44,417 - WARNING - /sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/numpy/core/_methods.py:188: RuntimeWarning: invalid value encountered in d
ouble_scalars
  ret = ret.dtype.type(ret / rcount)

2021-09-17 21:15:44,417 - WARNING - No ambient RNA beads were found; maybe sample had too few cells?
2021-09-17 21:15:44,418 - INFO - Found 0 valid cells and ~18 ambient UMIs.
2021-09-17 21:15:44,418 - INFO - Creating loom file 'JMJ9J11.loom'
2021-09-17 21:15:44,419 - INFO - Saving
Traceback (most recent call last):
  File "/home/indranil/.local/bin/loompy", line 8, in <module>
    sys.exit(cli())
  File "/sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/sw/comp/python/3.9.5/rackham/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/indranil/.local/lib/python3.9/site-packages/loompy/commands.py", line 34, in fromfq
    create_from_fastq(loomfile, sampleid, list(fastqs), indexdir, metadatafile, threads)
  File "/home/indranil/.local/lib/python3.9/site-packages/loompy/bus_file.py", line 455, in create_from_fastq
    bus.save(out_file, sample_id, samples_metadata_file)
  File "/home/indranil/.local/lib/python3.9/site-packages/loompy/bus_file.py", line 328, in save
    row_attrs[attr] = [v[i] for v in self.genes.values()]
  File "/home/indranil/.local/lib/python3.9/site-packages/loompy/bus_file.py", line 328, in <listcomp>
    row_attrs[attr] = [v[i] for v in self.genes.values()]
IndexError: list index out of range
lvlahos343 commented 2 years ago

I've also run into this exact same issue. Has their been any resolution on this front by the devs or others who ran into this problem?

mughetta commented 2 years ago

I was also having this issue and was able to solve it. To do this, I first had to look at the surrounding code for this error:

Transpose the gene metadata for i, attr in enumerate(self.gene_metadata_attributes): row_attrs[attr] = [v[i] for v in self.genes.values()]

I realized that this IndexError was referring to the fact that the counter i was increasing beyond the length of self.genes.values() in the list comprehension: [v[i] for v in self.genes.values()]. Since the counter i is counting through self.gene_metadata_attributes, that meant that the self.gene_metadata_attributes was longer than each thing (v) in self.genes.values(). Here, gene_metadata_attributes is a list of the metadata attribute titles and self.genes.values() is a list of lists with each nested list holding values corresponding to each metadata attribute for that gene. Therefore, the nested lists are each supposed to be the same length as the gene_metadata_attributes list. Basically, this IndexError is because there are more metadata attribute titles saved in self.gene_metadata_attributes than actual metadata saved in self.genes.values().

Personally, when I inspected self.gene_metadata_attributes list, I realized it still had the title for the VegasID, which was being called from the gencode.vM23.metadata.tab. Earlier, I had removed the code that added the VegasIDs themselves to the gencode.vM23.primary_assembly.annotation.gtf since BioMart does not provide these IDs anymore. However, it seems I had not removed this other reference to VegasIDs in the code.

To fix my error, I had to go back to the mouse_build.py file on my computer and remove where the "VegasID" attribute was being added to the gencode.vM23.metadata.tab on line 71. In general, solve this issue by making sure you are adding the same number of metadata values to the gencode.vM23.primary_assembly.annotation.gtf file as you are attribute names to the gencode.vM23.metadata.tab.