Open ZabalaAitor opened 2 weeks ago
Hey,
This happens if the GTF file does not meet the expectations. In this case, the gene_id
and transcript_id
fields in the attributes
column are missing. Please make sure to use an appropriate GTF file.
Also the pipeline version seems to be a bit outdated - please update using nextflow pull nf-core/circrna
Hey,
I used the default GTF file provided by eGenomes, which I believe should have the correct format. Regarding the pipeline, I did update it using nextflow pull nf-core/circrna, but it's possible that the update didn't complete properly due to issues with the HPC environment. I'll look into it to ensure the pipeline is fully updated.
Thanks,
I am sure the GTF will have the correct format; otherwise, errors will look different. The problem occurs because the GTF contains regions on sequences not present in the FASTA file.
This problem will also occur on the latest pipeline version, as I have not yet had time to fix it - this was just a side note.
EDIT: This message was a mixup - forget about it
The FASTA file is also provided by eGenomes...
Oh I'm sorry, I got mixed up between two issues. This issue does not have anything to do with the FASTA file. The one with the FASTA file compatibility problems is #151.
Still, the error you encounter is due to missing gene_id
and transcrip_id
entries in the GTF file. nf-core also discourages the usage of iGenomes as stated here. Maybe look inside the GTF file and see for yourself, but I can also add a check to the pipeline, which will give a user-friendly message if this happens again. To fix this I can recommend reference data from here.
I tried using another GTF file and encountered an error while running CIRIquant because it is unable to find the GTF file, whereas other tools, such as circRNA_finder, are able to do.
I have written about the issue in #155 . Please feel free to delete or close that entry if you prefer to resolve the issue here.
Thank you very much for your time and assistance.
This error persists despite using different GTF files. Could it be because there are no circRNAs in those samples?
You are absolutely right, this can also occur if no circRNAs are found. I should have thought about this earlier. You can confirm this is the case by switching to /scratch/azabala/work_sncRNA_circRNA/83/3b958d1d7194efaa23a82450c6e7f5
and investigating the GTF
file there.
If it is really the case, I will implement a clear error message pointing this out for future users.
I cannot find the GTF file in that directory, but the intersect.bed file is empty.
Yes okay, this is the reason then. Is the data you used confidential? Otherwise I would like to use it as test data for coming up with a clean solution
Description of the bug
Hello,
I am trying to run nf-core/circRNA on sncRNA samples, and I encountered an error during the annotation part for some of the samples. I noticed that the samples with errors have an empty intersect.bed file.
I am wondering what information is supposed to be in the intersect.bed file and what biological reasons could cause it to be empty.
Thank you very much,
Aitor Zabala
Command used and terminal output
Relevant files
No response
System information
Nextflow: 23.04.2 Hardware: HPC Executor: slurm Conatiner: Apptainer OS: Linux nf-core/circrna: dev