Closed ferygood closed 2 years ago
Hi,
Thank you for your interest in the software. I noticed that your snakemake is referring to the hg38_rmsk_TE.gtf.ind
, which is the pre-built index for TEcount
(available here). Are you using that instead of the GTF (which is available here)? I suspect that if you didn't download the prebuillt index (indicated by the .ind
in the file extension), then you might need to go into the snakemake file and change the TE file to refer to the GTF (.gtf
in the file extension) instead.
Please let me know if that doesn't resolve the issue.
Thanks.
Hi Oliver,
Thank you for your fast response. Sorry I forgot to mention that I have also try using GTF (hg38_rmsk_TE_20200804.gtf) download from the link you provided instead of pre-built index. I also quickly run it again using command line and get the same error:
(snakemake) yaochung41@host:/data/scratch/yaochung/mayo$ singularity exec tetranscripts.sif TEcount -b star_cer/pe/1015_CER/pe_aligned.bam --GTF /data/scratch/yaochung/ref/hg38.ensGene.gtf --TE /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf --sortByPos --project 1015_CER
ERROR:root:No such file: /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf !
hg_rmsk_TE_20200804.gtf
(snakemake) yaochung41@host:/data/scratch/yaochung/mayo$ head -n 10 /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf
chr1 hg38_rmsk exon 10469 11447 3612 - . gene_id "TAR1"; transcript_id "TAR1"; family_id "telo"; class_id "Satellite";
chr1 hg38_rmsk exon 11505 11675 484 - . gene_id "L1MC5a"; transcript_id "L1MC5a_dup3"; family_id "L1"; class_id "LINE";
chr1 hg38_rmsk exon 11678 11780 239 - . gene_id "MER5B"; transcript_id "MER5B_dup1"; family_id "hAT-Charlie"; class_id "DNA";
chr1 hg38_rmsk exon 15265 15355 318 - . gene_id "MIR3"; transcript_id "MIR3_dup11"; family_id "MIR"; class_id "SINE";
chr1 hg38_rmsk exon 18907 19048 239 + . gene_id "L2a"; transcript_id "L2a_dup29"; family_id "L2"; class_id "LINE";
chr1 hg38_rmsk exon 19972 20405 994 + . gene_id "L3"; transcript_id "L3_dup3"; family_id "CR1"; class_id "LINE";
chr1 hg38_rmsk exon 20531 20679 270 + . gene_id "Plat_L3"; transcript_id "Plat_L3"; family_id "CR1"; class_id "LINE";
chr1 hg38_rmsk exon 21949 22075 254 + . gene_id "MLT1K"; transcript_id "MLT1K_dup2"; family_id "ERVL-MaLR"; class_id "LTR";
chr1 hg38_rmsk exon 23120 23371 787 - . gene_id "MIR"; transcript_id "MIR_dup22"; family_id "MIR"; class_id "SINE";
chr1 hg38_rmsk exon 23804 24038 312 + . gene_id "L2b"; transcript_id "L2b_dup15"; family_id "L2"; class_id "LINE";
Therefore I do not think the error is caused by different format.
Hi,
I will contact a colleague who has more experience with Singularity, and see if this is an issue with the container or Singularity itself.
Thank you for your patience.
Hello,
Oliver contacted me about this singularity issue, I've been traveling so please excuse me for any delays. One of the things that comes to mind reading your remarks about singularity being unable to find a particular file on the filesystem, is that the directory that file is in may not be bound to the container. Singularity generally binds some default locations, but it needs to be told to bind things like "/data." on the systems config end.
You can fix a readable directory problem at runtime using the --bind
flag. The documentation for the singularity user bind options can be found here.
you might want to try binding the data directory you are using:
--bind /data/scratch/yaochung:/genomics
using that the data would be available in the genomics
directory:
singularity exec --bind /data/scratch/yaochung:/genomics tetranscripts.sif TEcount \
-b /genomics/star_cer/pe/1015_CER/pe_aligned.bam \
--GTF /genomics/ref/hg38.ensGene.gtf \
--TE /genomics/ref/hg38_rmsk_TE_20200804.gtf \
--project 1015_CER \
--sortByPos \
/
Hopefully that helps, if it doesn't we'll set something up and step through the configs.
Thank you for the suggestion! However, I am also traveling at the moment. I will give it a try on next week!
Hello,
Oliver contacted me about this singularity issue, I've been traveling so please excuse me for any delays. One of the things that comes to mind reading your remarks about singularity being unable to find a particular file on the filesystem, is that the directory that file is in may not be bound to the container. Singularity generally binds some default locations, but it needs to be told to bind things like "/data." on the systems config end.
You can fix a readable directory problem at runtime using the
--bind
flag. The documentation for the singularity user bind options can be found here.you might want to try binding the data directory you are using:
--bind /data/scratch/yaochung:/genomics
using that the data would be available in the
genomics
directory:singularity exec --bind /data/scratch/yaochung:/genomics tetranscripts.sif TEcount \ -b /genomics/star_cer/pe/1015_CER/pe_aligned.bam \ --GTF /genomics/ref/hg38.ensGene.gtf \ --TE /genomics/ref/hg38_rmsk_TE_20200804.gtf \ --project 1015_CER \ --sortByPos \ /
Hopefully that helps, if it doesn't we'll set something up and step through the configs.
Hello,
I run on the same issue today and adding --bind has solved the problem. As I needed to bind multiple directories I used
singularity exec --bind /dir1,/dir2 tetranscripts.sif
Thank you! Raquel
Hello @molikd
I now can fix the issue using the -bind
flag and integrate TEcount into my snakemake pipeline. I also have my files in multiple directories so I also follow @Raquelina solutions separating by comma. Thank you very much for your solution!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Hello Oliver, I want to integrate TEtranscript into my snakemake pipeline. However, I have an error using it with singularity.
I can download the container following the instruction:
singularity pull tetranscripts.sif docker://mhammelllab/tetranscripts:latest
However, when I execute it using TEcount, it gives me an error in cannot found GTF file for ransposable element. This error occurs both in snakemake pipeline and execute directly from command line. The strange thing is it can find the GTF file for gene annotations but not for transposable element. All the file path has been checked to be correct.
Could you give me some suggestions about this issue? For example, do I need to manually mount files before using it? Many thanks!