trinityrnaseq / trinityrnaseq

Trinity RNA-Seq de novo transcriptome assembly
BSD 3-Clause "New" or "Revised" License
833 stars 320 forks source link

Symlink shortcut to data directory breaks `prep_rnaseq_alignments_for_genome_assisted_assembly.pl` #1177

Open a-lud opened 2 years ago

a-lud commented 2 years ago

Hi,

I've discovered an edge case issue where prep_rnaseq_alignments_for_genome_assisted_assembly.pl fails by trying to create a symlink to prefix.bam.norm_200.bam in a directory where it already exists.

As you'll see, this is very much an issue caused by my bad habits, so I'm not really sure if it needs to be fixed. Just figured I'd put this here in case anyone else encounters the same problem!

Compute: Linux HPC Version: Trinity v2.14.0 Singularity image

How I broke it

I have a symlink in my $HOME directory to a data directory for convenience (saves me little bit of typing).

Absolute path: /g/data/xl04/al4518 Shortcut path: /home/566/al4518/al

Where al -> /g/data/xl04/al4518, which enables me to cd al when I log in to our cluster to get to my data files rather than having to do cd /g/data/xl04/al4518 constantly.

In my genome guided trinity command I used the shortcut path. Consequently, this broke prep_rnaseq_alignments_for_genome_assisted_assembly.pl at the following lines (line 121).

if (cwd() ne dirname(File::Spec->rel2abs($SAM_file))) {
    &process_cmd("$SYMLINK $SAM_file " . basename($SAM_file));
    $SAM_file = basename($SAM_file);
}

I'm guessing cwd() returns the absolute path (?) which differs to the path that I provided, meaning it tries to link the file to the output directory, but as it already exists there it causes the following error:

Friday, July 22, 2022: 14:09:58 CMD: /usr/local/bin/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hydmaj-rna.bam.norm_200.bam -I 100000 --sort_buffer 50G --CPU 16
CMD: ln -sf /home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hydmaj-rna.bam.norm_200.bam hydmaj-rna.bam.norm_200.bam
ln: '/home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hydmaj-rna.bam.norm_200.bam' and 'hydmaj-rna.bam.norm_200.bam' are the same file
Error, command ln -sf /home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hydmaj-rna.bam.norm_200.bam hydmaj-rna.bam.norm_200.bam died with ret 256 at /usr/local/bin/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 221.
    main::process_cmd("ln -sf /home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hy"...) called at /usr/local/bin/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 123
Error, cmd: /usr/local/bin/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /home/566/al4518/al/hydmaj-genome/trinity-gg-custom/hydmaj-rna.bam.norm_200.bam -I 100000 --sort_buffer 50G --CPU 16  died with ret 256 at /usr/local/bin/Trinity line 2879.
    main::process_cmd("/usr/local/bin/util/support_scripts/prep_rnaseq_alignments_fo"...) called at /usr/local/bin/Trinity line 3541
    main::run_genome_guided_Trinity("/home/566/al4518/al/hydmaj-genome/trinity-gg-custom/bam/hydma"..., undef) called at /usr/local/bin/Trinity line 1427

Again, this is a consequence of my laziness, so figured I'd put this here to remind people to use absolute paths!

Cheers Al

JuanCamargoTavares commented 1 year ago

I'm having the same issue. I get the same error and I'm not sure what I'm supposed to do since I'm working on a big cluster, I don't have any symlinks but I guess the cluster works with them.

brianjohnhaas commented 1 year ago

Hi,

Recent versions of Trinity have an option: --no_symlink

Try using that and let's see if that resolves the symlink related issue.

best,

~b

On Sun, Jan 8, 2023 at 10:22 PM JuanCamargoTavares @.***> wrote:

I'm having the same issue. I get the same error and I'm not sure what I'm supposed to do since I'm working on a big cluster, I don't have any symlinks but I guess the cluster works with them.

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1375061589, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX2XCAU2DBF2LSCF5OTWRN77TANCNFSM54KD7LJA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

JuanCamargoTavares commented 1 year ago

Thanks for your help. Unfortunately I still get a similar error:

Monday, January 9, 2023: 12:49:15   CMD: touch /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam.ok
Monday, January 9, 2023: 12:49:15   CMD: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam -I 15000 --sort_buffer 180G --CPU 18 
CMD: cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam AA9.sort.bam.norm_200.bam
cp: '/localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam' and 'AA9.sort.bam.norm_200.bam' are the same file
Error, command cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam AA9.sort.bam.norm_200.bam died with ret 256 at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 221.
    main::process_cmd("cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9."...) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 123
Error, cmd: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam -I 15000 --sort_buffer 180G --CPU 18  died with ret 256 at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 2879.
    main::process_cmd("/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Cor"...) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 3541
    main::run_genome_guided_Trinity("/home/artefac/scratch/cip_transcriptomes/scripts/../data/refe"..., undef) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 1427

I have no idea what the issue could be.

brianjohnhaas commented 1 year ago

I think I see what the issue is.

Can you share your Trinity command?

thx,

~b

On Mon, Jan 9, 2023 at 4:31 PM JuanCamargoTavares @.***> wrote:

Thanks for your help. Unfortunately I still get a similar error:

Monday, January 9, 2023: 12:49:15 CMD: touch /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam.ok Monday, January 9, 2023: 12:49:15 CMD: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam -I 15000 --sort_buffer 180G --CPU 18 CMD: cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam AA9.sort.bam.norm_200.bam cp: '/localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam' and 'AA9.sort.bam.norm_200.bam' are the same file Error, command cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam AA9.sort.bam.norm_200.bam died with ret 256 at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 221. main::process_cmd("cp /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9."...) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl line 123 Error, cmd: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl --coord_sorted_SAM /localscratch/artefac.2359679.0/trinity_reference_AA9/AA9.sort.bam.norm_200.bam -I 15000 --sort_buffer 180G --CPU 18 died with ret 256 at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 2879. main::process_cmd("/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Cor"...) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 3541 main::run_genome_guided_Trinity("/home/artefac/scratch/cip_transcriptomes/scripts/../data/refe"..., undef) called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/trinity/2.14.0/trinityrnaseq-v2.14.0/Trinity line 1427

I have no idea what the issue could be.

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1376349384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX2CGBH3FY2HBIUO4GLWRR7T3ANCNFSM54KD7LJA . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

JuanCamargoTavares commented 1 year ago

This was my Trinity command:

$Trinity --no_symlink \
        --genome_guided_bam ${REF_DIR}${NAME}.sort.bam \
        --genome_guided_max_intron 15000 \
        --max_memory ${MEM} --CPU ${THREADS} \
        --output ${TEMP_DIR} 

I'm working on a compute canada cluster as you can see in my last comment.

Juan

brianjohnhaas commented 1 year ago

Thanks. Can you try replacing:

${REF_DIR}${NAME}.sort.bam

with just ${NAME}.sort.bam and let's see if that works...?

On Tue, Jan 10, 2023 at 10:02 AM JuanCamargoTavares < @.***> wrote:

This was my Trinity command:

$Trinity --no_symlink \ --genome_guided_bam ${REF_DIR}${NAME}.sort.bam \ --genome_guided_max_intron 15000 \ --max_memory ${MEM} --CPU ${THREADS} \ --output ${TEMP_DIR}

I'm working on a compute canada cluster as you can see in my last comment.

Juan

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1377408970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7YZP7FJY56AMH3EFLWRV2Z3ANCNFSM54KD7LJA . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

JuanCamargoTavares commented 1 year ago

It fails because it cannot find the bam file: Failed to open file "AA9.sort.bam" : No such file or directory

Juan

brianjohnhaas commented 1 year ago

OK. The only solution I can come up with here is if I modify one of the Trinity scripts. If I do that, would you be able to patch your software installation with it?

On Tue, Jan 10, 2023 at 10:42 AM JuanCamargoTavares < @.***> wrote:

It fails because it cannot find the bam file: Failed to open file "AA9.sort.bam" : No such file or directory

Juan

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1377464048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYVKFN6XM7HBYLFSCLWRV7PBANCNFSM54KD7LJA . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

JuanCamargoTavares commented 1 year ago

Yes I can make a custom installation in my personal storage in the cluster. Here is my email: juan.camargotavares@mail.mcgill.ca

brianjohnhaas commented 1 year ago

The updated script is here: https://github.com/trinityrnaseq/trinityrnaseq/blob/devel/util/support_scripts/prep_rnaseq_alignments_for_genome_assisted_assembly.pl

You can just drop in a replacement for that script and try running Trinity again.

If you need to have it run from your personal storage area, then try recursively copying your current Trinity software distribution from wherever it's currently installed to your local area, and drop this replacement script in there. Just be sure it's executable: chmod 775 prep_rnaseq_alignments_for_genome_assisted_assembly.pl

Please let me know how it goes.

best,

~b

On Tue, Jan 10, 2023 at 10:51 AM JuanCamargoTavares < @.***> wrote:

Yes I can make a custom installation in my personal storage in the cluster. Here is my email: @.***

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1377476643, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4D3JG5ZM2G2YL7L4DWRWARDANCNFSM54KD7LJA . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

JuanCamargoTavares commented 1 year ago

It's working now, thanks!

cheers, Juan

brianjohnhaas commented 1 year ago

Great! Thanks!

~b

On Tue, Jan 10, 2023 at 12:02 PM JuanCamargoTavares < @.***> wrote:

It's working now, thanks!

cheers, Juan

— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1177#issuecomment-1377573759, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXZ2MD5UJERZQB6T433WRWIZHANCNFSM54KD7LJA . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas