qichao1984 / NCyc

42 stars 22 forks source link

Unsure how to get started... #14

Open maracashay opened 4 years ago

maracashay commented 4 years ago

Hi there,

Could you please walk me through how to use this tool? I'm working on Mac Catalina and just installed perl.

Should I copy/paste the perl script into my terminal? Do I need to pre-download Diamond?

Many thanks!

MaryoHg commented 4 years ago

Hi, @maracashay.

  1. git clone the repository to a defined location in your computer and install DIAMOND from source. Or simple download the binary for OSX. USEARCH is not an option unless you want to pay the 64-bit app option; Catalina won't handle 32-bits app anymore).

  2. Edit the main NCyc script to put the full path of DIAMOND aligner. Visualize it with TexEdit or nano or vim. Your choice.

  3. Follow the instructions on the tutorial.

Note: Irrespective of the fact that you will use DIAMOND, check out if the computer effort needed will be provided by your laptop (?).

Good Luck.

maracashay commented 4 years ago

Thanks for your quick response!

I have downloaded Diamond and I edited the script to include the full path of the Diamond aligner.

Now, I'm getting this error: perl ~/Downloads/NCyc-Perl-1.pl -d ~/A-Hedge-Megahit-Output -m diamond -f fa -s nucl -si ~/Downloads/NCyc-Param.txt -o NCyc-A-Hedge /Users/maraslaptop/diamond2: /Users/maraslaptop/diamond2: cannot execute binary file /Users/maraslaptop/diamond2: /Users/maraslaptop/diamond2: cannot execute binary file cp: /Users/maraslaptop/A-Hedge-Megahit-Output/final.contigs.diamond: No such file or directory

1

I attached my edited script here. NCyc-Perl.1.txt

I must be missing something simple. Any help is GREATLY appreciated!

MaryoHg commented 4 years ago

Hi again @maracashay ,

First of all, the my $diamond direction must be the binary itself, i.e., /Users/maraslaptop/diamond2/diamond rather than the folder-path which contain the binary.

Then, be sure the diamond binary has +x permission: chmod +x /Users/maraslaptop/diamond2/diamond

Finally, do not forget the tsv file in which the samples names (without fastq or fna extension: check this out. If it fails, then add the fastq extension and run again) indicates the number of reads in each file. See below.

Assembly1 230130
Assembly2 330200
Assembly3 100000
Assembly4 150000

P.S. When I first ran this script, I employed a single file of my metagenome assembly. Until I made it. You can use a subset of your sequences if it is too big.

I hope this helps. Be fine. :)

MaryoHg.

jianshu93 commented 4 years ago

The author suggest that we should use unassembled reads for annotation N genes (https://github.com/qichao1984/NCyc/issues/13) but not assembled contigs, which I think deserves more consideration.

Hope this helps,

Thanks,

Jianshu

xiaorui1996 commented 4 years ago

The author suggest that we should use unassembled reads for annotation N genes (#13) but not assembled contigs, which I think deserves more consideration.

Hope this helps,

Thanks,

Jianshu

Hi, I also want to use the contigs for N genes annotation, and I use prokka to transfer the contigs.fa to nucleotide.fa. However, something always went wrong in $sampleinfo file, perhaps we cannot annotate contigs? Any suggestions would be grateful!

jianshu93 commented 4 years ago

just use prodigal to predict genes. Prokka actually use prodigal and then annotate use blastp. I would suggest use prodigal and then use diamond/usearch for annotation.

ZehanDai commented 4 years ago

Hi again @maracashay ,

First of all, the my $diamond direction must be the binary itself, i.e., /Users/maraslaptop/diamond2/diamond rather than the folder-path which contain the binary.

Then, be sure the diamond binary has +x permission: chmod +x /Users/maraslaptop/diamond2/diamond

Finally, do not forget the tsv file in which the samples names (without fastq or fna extension: check this out. If it fails, then add the fastq extension and run again) indicates the number of reads in each file. See below.

Assembly1 230130 Assembly2 330200 Assembly3 100000 Assembly4 150000 P.S. When I first ran this script, I employed a single file of my metagenome assembly. Until I made it. You can use a subset of your sequences if it is too big.

I hope this helps. Be fine. :)

MaryoHg.

Thanks for the sample , the parameter -si is poorly documented. It's description too abstract to understand.