Closed slowkow closed 3 years ago
Is this issue related to the "Dual Index"?
This particular job has a sample sheet with two values for the column Index
:
SI-TT-A1
SI-NT-A1
I would suggest changing this special if-statement:
Right now it is looking for samples where Index
was not set. We infer that Index
is not set when it is missing the -
character.
Instead, the code should look for FASTQ files that are not in a {sample_id}
folder. If files are found, then a new {sample_id}
folder should be created and the files moved there.
In other words, the code inside the if-statement should always be running, regardless of whether or not Index
contains the -
character.
Hello,
I also got a similar exception after FASTQ generation in the call-collect-summaries step, where it seems that the pipeline is not creating the sample folder (as suggested above?).
CommandException: No URLs matched: gs://fc-secure-a6aa0703-c003-48d2-bfaf-557d70adefb1/201230_NB501935_0898_AH7H7HBGXF/cellranger_output/nasal_mucosa_test2_rna/metrics_summary.csv
I am running this on data from a NextSeq. Interestingly, when I submitted using a sample sheet that specified only 1
for Lane
for this sample and its associated antibody hash, the pipeline didn't throw any error. Now that I am trying to run it with Lane
set to 1-4
or *
, I am getting this error.
It's clear that the folder and this file do not exist. The pipeline only spits out FASTQs that are separated by sample and lane.
Thanks for any insight!
It sounds like you also have the situation where one step looks for a folder that was not created by the previous step.
To resolve this particular issue, I manually created the missing folders with the gsutil
tool and moved the files by myself. Then the rest of the workflow continued without errors.
CommandException: No URLs matched: gs://fc-abc-123/steve/project/2020-12-22/output/201222_NB551582_0040_AH2YNLBGXH_fastqs/fastq_path/H2YNLBGXH/project_mgh_3_gex
CommandException: 1 file/object could not be transferred.
In this case I had files like this:
fastq_path/H2YNLBGXH/{sample_id}_S*_L*_*_001.fastq.gz
But cellranger_count
was looking for:
fastq_path/H2YNLBGXH/{sample_id}/{sample_id}_S*_L*_*_001.fastq.gz
Maybe a separate issue, but is the workflow built to collapse samples split across lanes (and thus in separate FASTQs) during the cellranger_count
or call-collect-summaries
? Or maybe this problem could be avoided if I demultiplexed separately to only have one FASTQ with the data from all lanes merged?
Hi @slowkow @majorkazer ,
I can confirm the issue was caused by dual index. We have fixed this issue in the newly released Cumulus 1.2.0 (https://cumulus.readthedocs.io/en/latest/index.html). Please give it a try!
Problem
I ran the cellranger_workflow version 14 and Cell Ranger version 4.0.0.
Here's the error I get:
Here's the failed job, in case you want to have a look for yourself: https://job-manager.dsde-prod.broadinstitute.org/jobs/95552f84-77fd-4174-8561-99afb9ea9771
This error occurs because
cellranger_count
is looking for a folder that is called{sample_id}
, in this case "project_mgh_3_gex".But a folder with the name
sample_id
is not created by thecellranger_mkfastq
step. Is it possible thatcellranger mkfastq
did not make the{sample_id}
folder for some reason? For example, could it be because the sample sheet had only 1 sample?I can see that
cellranger_mkfastq
created FASTQ files like{sample_id}_S*.fastq.gz
without making a folder called{sample_id}
.Here is the code where
cellranger_count
is looking for the folder{sample_id}
:https://github.com/klarman-cell-observatory/cumulus/blob/06968583beabbc25684e51fa74761b90593a3028/workflows/cellranger/cellranger_count.wdl#L107-L112
Solution
Maybe this code should be changed? It seems that this code is checking for the situation where we have
{sample_id}_S*.fastq.gz
filenames, and then it makes a new folder called{sample_id}
and moves the FASTQ files into it.https://github.com/klarman-cell-observatory/cumulus/blob/06968583beabbc25684e51fa74761b90593a3028/workflows/cellranger/cellranger_mkfastq.wdl#L109-L121