mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

find : Argument list too long #9

Closed DarrenObbard closed 2 years ago

DarrenObbard commented 3 years ago

Hi!

I get an "argument list too long" problem with line 431 of cenote-taker2.1.2.sh

cat $( find -maxdepth 0 -type f -name ".AA.sorted.fasta" ) > all_large_genome_proteins.AA.fasta

probably because that * is expanding to something very large.

I don't understand the reason for the construction "find * -maxdepth 0". Why not just "find " or "find -maxdepth 1" ? both seem to work.

Regards!

D

mtisza1 commented 3 years ago

Thanks again for opening an issue. I had someone else reach out to me with a similar issue. I'm working to fix this right now, and I'll let you know when I've worked out the kinks.

For a quick fix, you can reduce the number of input sequences by splitting your fasta file. something like this (filters sequences < 1000 from a file named MY_CONTIGS.fasta, splits by 10,000 sequences per file):

bioawk -c fastx '{ if (length($seq)>=1000) {print ">"$name ; print $seq }}' MY_CONTIGS.fasta | awk 'BEGIN {n_seq=0;} /^>/ {if(n_seq%10000==0){file=sprintf("MY_SPLIT_SEQUENCES_%d.fasta",n_seq);} print >> file; n_seq++; next;} { print >> file; }'

Regards,

Mike

DarrenObbard commented 3 years ago

I'm running cenote-taker2 on around 250 datasets, so it would be a certain amount of hassle to add in extra steps to split this file up and then run it on each of the split files.

Is there a particular reason my suggestion above won't work? (As I say, I don't understand the purpose of the find options you've used.)

mtisza1 commented 3 years ago

Darren,

I do think your suggestions will work, but I use the find * -maxdepth 0 dozens of times, so I'm worried it will break something unexpectedly when I change all of these instances. So, I'm testing many datasets on this change before I push an update with the change (I believe it did cause an issue in one place). Life has been getting in the way a little bit, but I hope to have this fixed soon.

Mike

mtisza1 commented 3 years ago

Hi Darren,

I just pushed some updates (v2.1.3) which should fix the find command issues. I've also included new RdRp HMMs, in part from the sequences you sent me a while ago. Thanks so much for the help. Please do:

conda activate cenote-taker2_env
cd Cenote-Taker2
git pull
python update_ct2_databases.py --hmm True

I hope this works for you.

Cheers,

Mike

DarrenObbard commented 3 years ago

Great! Thanks, trying it now!

DarrenObbard commented 3 years ago

Hi! In version 2.1.3 I still get one of these when its clearing up:

removing ancillary files
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.3.sh: line 2181: /usr/bin/rm: Argument list too long
mtisza1 commented 3 years ago

OK I've changed the rm commands at the end of the script to utilize find, so I hope this is resolved. Please do:

cd Cenote-Taker2
git pull