mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Singularity ERROR:root:No such file #119

Closed ferygood closed 1 year ago

ferygood commented 1 year ago

Hello Oliver, I want to integrate TEtranscript into my snakemake pipeline. However, I have an error using it with singularity.

I can download the container following the instruction: singularity pull tetranscripts.sif docker://mhammelllab/tetranscripts:latest

However, when I execute it using TEcount, it gives me an error in cannot found GTF file for ransposable element. This error occurs both in snakemake pipeline and execute directly from command line. The strange thing is it can find the GTF file for gene annotations but not for transposable element. All the file path has been checked to be correct.

Could you give me some suggestions about this issue? For example, do I need to manually mount files before using it? Many thanks!

Mon Aug 29 14:27:45 CEST 2022
cmp249
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job        count    min threads    max threads
-------  -------  -------------  -------------
TEcount       38              1              1
all            1              1              1
total         39              1              1

Select jobs to execute...

[Mon Aug 29 14:27:48 2022]
rule TEcount:
    input: star_cer/pe/892_CER/pe_aligned.bam, /data/scratch/yaochung/ref/hg38.ensGene.gtf, /data/scratch/yaochung/ref/hg38_rmsk_TE.gtf.ind
    output: tecount/892_CER.cntTable
    jobid: 43
    reason: Missing output files: tecount/892_CER.cntTable
    wildcards: sample=892_CER
    resources: tmpdir=/tmp

ERROR:root:No such file: /data/scratch/yaochung/ref/hg38_rmsk_TE.gtf.ind !

[Mon Aug 29 14:27:50 2022]
Error in rule TEcount:
    jobid: 43
    output: tecount/892_CER.cntTable
    shell:
        singularity exec tetranscripts.sif TEcount -b star_cer/pe/892_CER/pe_aligned.bam --GTF /data/scratch/yaochung/ref/hg38.ensGene.gtf --TE /data/scratch/yaochung/ref/hg38_rmsk_TE.gtf.ind --sortByPos --project tecount/892_CER
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-08-29T142746.338214.snakemake.log
olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. I noticed that your snakemake is referring to the hg38_rmsk_TE.gtf.ind, which is the pre-built index for TEcount (available here). Are you using that instead of the GTF (which is available here)? I suspect that if you didn't download the prebuillt index (indicated by the .ind in the file extension), then you might need to go into the snakemake file and change the TE file to refer to the GTF (.gtf in the file extension) instead.

Please let me know if that doesn't resolve the issue.

Thanks.

ferygood commented 1 year ago

Hi Oliver,

Thank you for your fast response. Sorry I forgot to mention that I have also try using GTF (hg38_rmsk_TE_20200804.gtf) download from the link you provided instead of pre-built index. I also quickly run it again using command line and get the same error:

(snakemake) yaochung41@host:/data/scratch/yaochung/mayo$ singularity exec tetranscripts.sif TEcount -b star_cer/pe/1015_CER/pe_aligned.bam --GTF /data/scratch/yaochung/ref/hg38.ensGene.gtf --TE /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf --sortByPos --project 1015_CER
ERROR:root:No such file: /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf !

hg_rmsk_TE_20200804.gtf

(snakemake) yaochung41@host:/data/scratch/yaochung/mayo$ head -n 10 /data/scratch/yaochung/ref/hg38_rmsk_TE_20200804.gtf
chr1    hg38_rmsk   exon    10469   11447   3612    -   .   gene_id "TAR1"; transcript_id "TAR1"; family_id "telo"; class_id "Satellite";
chr1    hg38_rmsk   exon    11505   11675   484 -   .   gene_id "L1MC5a"; transcript_id "L1MC5a_dup3"; family_id "L1"; class_id "LINE";
chr1    hg38_rmsk   exon    11678   11780   239 -   .   gene_id "MER5B"; transcript_id "MER5B_dup1"; family_id "hAT-Charlie"; class_id "DNA";
chr1    hg38_rmsk   exon    15265   15355   318 -   .   gene_id "MIR3"; transcript_id "MIR3_dup11"; family_id "MIR"; class_id "SINE";
chr1    hg38_rmsk   exon    18907   19048   239 +   .   gene_id "L2a"; transcript_id "L2a_dup29"; family_id "L2"; class_id "LINE";
chr1    hg38_rmsk   exon    19972   20405   994 +   .   gene_id "L3"; transcript_id "L3_dup3"; family_id "CR1"; class_id "LINE";
chr1    hg38_rmsk   exon    20531   20679   270 +   .   gene_id "Plat_L3"; transcript_id "Plat_L3"; family_id "CR1"; class_id "LINE";
chr1    hg38_rmsk   exon    21949   22075   254 +   .   gene_id "MLT1K"; transcript_id "MLT1K_dup2"; family_id "ERVL-MaLR"; class_id "LTR";
chr1    hg38_rmsk   exon    23120   23371   787 -   .   gene_id "MIR"; transcript_id "MIR_dup22"; family_id "MIR"; class_id "SINE";
chr1    hg38_rmsk   exon    23804   24038   312 +   .   gene_id "L2b"; transcript_id "L2b_dup15"; family_id "L2"; class_id "LINE";

Therefore I do not think the error is caused by different format.

olivertam commented 1 year ago

Hi,

I will contact a colleague who has more experience with Singularity, and see if this is an issue with the container or Singularity itself.

Thank you for your patience.

molikd commented 1 year ago

Hello,

Oliver contacted me about this singularity issue, I've been traveling so please excuse me for any delays. One of the things that comes to mind reading your remarks about singularity being unable to find a particular file on the filesystem, is that the directory that file is in may not be bound to the container. Singularity generally binds some default locations, but it needs to be told to bind things like "/data." on the systems config end.

You can fix a readable directory problem at runtime using the --bind flag. The documentation for the singularity user bind options can be found here.

you might want to try binding the data directory you are using:

--bind /data/scratch/yaochung:/genomics

using that the data would be available in the genomics directory:

singularity exec --bind /data/scratch/yaochung:/genomics tetranscripts.sif TEcount \
  -b /genomics/star_cer/pe/1015_CER/pe_aligned.bam \
  --GTF /genomics/ref/hg38.ensGene.gtf \
  --TE /genomics/ref/hg38_rmsk_TE_20200804.gtf \
  --project 1015_CER \
  --sortByPos \
/

Hopefully that helps, if it doesn't we'll set something up and step through the configs.

ferygood commented 1 year ago

Thank you for the suggestion! However, I am also traveling at the moment. I will give it a try on next week!

Raquelina commented 1 year ago

Hello,

Oliver contacted me about this singularity issue, I've been traveling so please excuse me for any delays. One of the things that comes to mind reading your remarks about singularity being unable to find a particular file on the filesystem, is that the directory that file is in may not be bound to the container. Singularity generally binds some default locations, but it needs to be told to bind things like "/data." on the systems config end.

You can fix a readable directory problem at runtime using the --bind flag. The documentation for the singularity user bind options can be found here.

you might want to try binding the data directory you are using:

--bind /data/scratch/yaochung:/genomics

using that the data would be available in the genomics directory:

singularity exec --bind /data/scratch/yaochung:/genomics tetranscripts.sif TEcount \
  -b /genomics/star_cer/pe/1015_CER/pe_aligned.bam \
  --GTF /genomics/ref/hg38.ensGene.gtf \
  --TE /genomics/ref/hg38_rmsk_TE_20200804.gtf \
  --project 1015_CER \
  --sortByPos \
/

Hopefully that helps, if it doesn't we'll set something up and step through the configs.

Hello, I run on the same issue today and adding --bind has solved the problem. As I needed to bind multiple directories I used singularity exec --bind /dir1,/dir2 tetranscripts.sif

Thank you! Raquel

ferygood commented 1 year ago

Hello @molikd

I now can fix the issue using the -bind flag and integrate TEcount into my snakemake pipeline. I also have my files in multiple directories so I also follow @Raquelina solutions separating by comma. Thank you very much for your solution!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days