Open nick-youngblut opened 5 months ago
Note: I'm using the gtf file instead of gff based on https://github.com/mritchielab/FLAMES/issues/27
Seems to be a bug in the new Rcpp implementation. I have not been able to locate the bug, but you could edit the config file to either switch multithread_isoform_identification
to false (to use the old python implementation) or switch bambu_isoform_identification
to true (to use bambu instead)
If I use bambu_isoform_identification = TRUE
, then I get the following error:
Error in check_arguments(annotation, fastq, genome_bam, outdir, genome_fa, :
Bambu requires GTF format for annotation file.
I'm using a GTF file for the annotation input, but it is gzip'd, which I'm guessing is causing the error.
Hmm, bambu can probably read gziped annotation but the custom parser in our legacy code won't. I'll specify that in the docs.
For the multithread_isoform_identification
implementation I got a Error: unordered_map::at
error with plain GTF but there seems to a ton of unordered_map calls in the functions and I cannot locate where the issue is at the moment
Moreover, I get the following error even when using an uncompressed gtf file:
[E::fai_build3_core] Cannot index files compressed with gzip, please use bgzip
It appears that the genome_fa
must be bgzip'd or uncompressed. Do you think that you will just create uncompressed temporary files for these reference files, at least when the user selects bambu?
error with plain GTF but there seems to a ton of unordered_map calls in the functions and I cannot locate where the issue is at the moment
Any updates on this?
Same for the please use bgzip
error: is it necessary for FLAMES to create a temporary uncompressed (or bgzip-compressed) file when the user selects bambu?
Same for the
please use bgzip
error: is it necessary for FLAMES to create a temporary uncompressed (or bgzip-compressed) file when the user selects bambu?
Sounds reasonable, I think it assumes everything is uncompressed at the moment
Some printf debugging led me to believe the problem is coming from the get_gene_blocks
function around here, because not all genes present in chr_to_gene
map are present in the gene_dict
map for some reason.
@ChangqingW
So, in my case the issue is coming from the get_gene_blocks
as I described before. The gene it's failing at is a small mitochondrial gene with only 1 transcript containing only 1 exon.
Looks like it's missing from gene_dict
because it's also not present in the gene_to_transcript
map after remove_similar_tr
was ran. Pretty sure on this line:
the whole outer loop iteration is skipped, including this block
that is supposed to add the gene to the new out_gene_to_transcript
map.
I'm not quite sure if the intention behind that continue
was to skip those genes entirely or to only skip the isoform comparison. If it's the former you'd need to add some condition downstream that skips the genes not present in gene_to_transcript
and gene_dict
, and if the latter you need to change the continue
statement to only apply to the inner loop.
@maxim-h
Thanks for your suggestions! It looks like remove_similar_tr
was ignoring genes with only 1 transcript, which was not the intended behaviour. I've updated the relevant function in 6412113f5772cc9c2bff1a35f6f18a73ef2c89d1
During the
Reading Gene Annotations
step in thesc_long_pipeline()
workflow, I'm getting a_Map_base::at
error.I'm using a decompressed fastq for the BLAZE output (ran that stand-alone). I'm using
GCF_000001405.40_GRCh38.p14
for the reference: (GCF_000001405.40_GRCh38.p14_genomic.fna.gz
andGCF_000001405.40_GRCh38.p14_genomic.gtf.gz
).My config:
My
sc_long_pipeline()
job:My console output:
I'm using the patched version of FLAMES from https://github.com/mritchielab/FLAMES/issues/26.
My sessionInfo: