Open nick-youngblut opened 7 months ago
I believe matched_reads.fastq
is supposed to be created after the console output Generating deduplicated fastq file ...
, maybe @youyupei can comment on this.
In the mean time if you just want to get things done, it is perfectly fine to get around this by soft-linking ln -s input.fq matched_reads.fastq
.
Also, can you confirm the fastq file have the read IDs in the format of @[cell barcode]_[UMI]#[other stuff]
?
Hi @nick-youngblut , the matched_reads.fastq
is expected to be in the output directory when the demultiplexing was done using the integrated demultiplexing step (either using BLAZE or flexiplex) in FLAMES . @ChangqingW is correct, if you have run BLAZE or flexible separately and have output into a different filename, you could simply make a symbolic link as @ChangqingW suggested.
As the genome alignment has been done, you don't have to rerun it. (you could set "do_genome_alignment": [false]
).
Thanks @youyupei and @ChangqingW for the information.
BLAZE wrote a compressed fastq: matched_reads.fastq.gz
. I used this file as input to the pipeline:
sc_long_pipeline(
fastq = 'path/to/matches_reads.fastq.gz'
)
This is in the log that I provided:
input fastq: /home/rstudio/workspace//data/SspArc0008_10x_cDNA_longRead//blaze_output/matched_reads.fastq.gz
The first line from the fastq: @GGAGCAACAAGTGGCA_GGGTGAACTCGA#c3b08a02-a1e2-4cf5-a4ea-8474f5dd9789_+
. So, it appears that the format is @[cell barcode]_[UMI]#[other stuff
.
More generally, sc_long_pipeline()
took a long time to fail due to providing gzip'd fastq as input. It would be helpful if all input was checked at the start of the pipeline, so that the software can fail fast.
I see. This is a bug. When skipping the pipeline's demultiplexing step, the gene quantification parts did not respect the fastq input name and was searching for matched_reads.fastq
instead.
It's also coded to look for an uncompressed fastq, but BLAZE outputs a compressed fastq. I was hoping to just symlink the matched_reads.fastq.gz
into the FLAMES working directory that I'm using, but I'm going to have gunzip first.
I see. This is a bug. When skipping the pipeline's demultiplexing step, the gene quantification parts did not respect the fastq input name and was searching for matched_reads.fastq instead.
Just wanted to check: any updates on this bug?
Hi Nick, Sorry, not yet. I am a bit inundated with some other tasks at the moment
Thanks @ChangqingW for the update
The traceback:
There is indeed no file named
matched_reads.fastq
in the output directory. The files in the output directory are:The entire run log:
I'm using FLAMES 1.8.0.