odomlab2 / sci-rocket

Snakemake workflow for (pre-)processing sci-RNA-seq3 data
MIT License
4 stars 4 forks source link

bcl2fastq > cbcl files not found #17

Closed plhm closed 11 months ago

plhm commented 11 months ago

Hello there all.

Snakemake is trying to run the bcl2fastq command to obtain fastqs from the bcl files, but I got the following error when doing so:

2023-12-15 14:03:10 [2b8569c9a180] INFO: Patterned flowcell detected
2023-12-15 14:03:10 [2b8569c9a180] WARNING: No cbcl files found for cycle 1. Attempt 1 of 3. Retry with cycle 2.
2023-12-15 14:03:10 [2b8569c9a180] WARNING: No cbcl files found for cycle 2. Attempt 2 of 3. Retry with cycle 3.
2023-12-15 14:03:10 [2b8569c9a180] ERROR: bcl2fastq::common::Exception: 2023-Dec-15 14:03:10: Success (0): /opt/conda/conda-bld/bcl2fastq2_1548424849859/work/src/cxx/lib/layout/Layout.cpp(1028): Throw in function static void bcl2fastq::layout::TileLayoutDetector::readTilesFromCbclHeaders(const boost::filesystem::path&, const bcl2fastq::config::SampleSheetCsv&, const std::vector<boost::basic_regex<char, boost::regex_traits<char> > >&, const string&, std::vector<bcl2fastq::layout::LaneInfo>&, bcl2fastq::common::TileFileMap&, bcl2fastq::common::NumBasesPerByte&, bcl2fastq::common::CycleNumber, bcl2fastq::common::CycleNumber, bool)
Dynamic exception type: boost::exception_detail::clone_impl<bcl2fastq::common::InputDataError>
std::exception::what:
No cbcl files found for cycle 3. Attempt 3 of 3.
Corrupt or missing .cbcl files were found for 3 consecutive cycles, verify the source of the data.

Here is the function I ran:

bcl2fastq --barcode-mismatches 1 --ignore-missing-positions --ignore-missing-controls --ignore-missing-filter --ignore-missing-bcls --no-lane-splitting --minimum-trimmed-read-length 15 --mask-short-adapter-reads 15 -R /project/kyathit/RawData/231208_sciseq-Pilot/usftp21.novogene.com/20231128_lh00328_0116_A22GHJ3LT3-L3 --sample-sheet rerio_sci_test/raw_reads/fake.csv --output-dir rerio_sci_test/raw_reads/ --loading-threads 1 --processing-threads 1 --writing-threads 1

And here is how the fake.csv looked:

[DATA] Lane,Sample_ID,Sample_Name,index,index2 ,fake,fake,NNNNNNNNNN,NNNNNNNNNN

I am wondering if this is an issue you've ran into, or whether the bcl file I received from the company is indeed corrupt.

Best,

Pietro

J0bbie commented 11 months ago

Hi Pietro,

Hmm, never seen that before. Is it missing some files or folders? Just to give you an idea of what a typical structure looks like for our data:

image

I'm also adding the start from .fq.gz this week, that might already alleviate this for you.

plhm commented 11 months ago

Hey, Jobbie.

I figured out what was up. In short, here is how you make your dummy file for bcl2fastq:

# Generate fake sample-sheet to allow indexes to be added to R1/R2. mkdir -p {params.path_out} echo -e "[DATA]\nLane,Sample_ID,Sample_Name,index,index2\n,fake,fake,NNNNNNNNNN,NNNNNNNNNN" > {output.sample_sheet} """

I dug around the bcl2fastq file and found that in not giving the program a lane (i.e. \n,fake), it tries to run bcl2fastq across all lanes in the presumed run. The issue is that I only used one lane out of the idk how many lanes in the flow cell form Novogene.

I fixed the issue by going into your step1_bcl2fastq.smk file, and hardcoding 003 in your echo line (turns out that my lane, was lane # 3). That fixed the issue.

As you said, your fastq option will likely fix this, but, if you have others having the same issue, then may I suggest that you either get a line of code to check which lanes are in the bcl folder, or you ask the user to tell the program what lanes are in the dataset.