steuernb / NLR-Annotator

NLR-Annotator upload
GNU General Public License v3.0
56 stars 24 forks source link

Overlapping NLRs regions #13

Closed SuryaHembrom closed 4 years ago

SuryaHembrom commented 4 years ago

Hi! I cannot figure out that why the NLR annotator is giving two NLRs in same region in the .bed and .txt files. instance 1: sample LA4332_X1 - 2 partial genes LA4332_X1_opera_scaffold_3643_pilon LA4332_X1_opera_scaffold_3643_pilon_nlr_1 partial 900 1761 - 1,4,5,10,3,12,7 LA4332_X1_opera_scaffold_3643_pilon LA4332_X1_opera_scaffold_3643_pilon_nlr_2 partial 34 1761 - 1,4,5,10,3,12,7

instance 2: sample LA4329_NO2 - 1 complete gene and 1 partial pseudogene LA4329_NO2_opera_scaffold_334_pilon LA4329_NO2_opera_scaffold_334_pilon_nlr_1 complete 1818 3619 - 17,16,1,6,4,5,10,3,12,11,11 LA4329_NO2_opera_scaffold_334_pilon LA4329_NO2_opera_scaffold_334_pilon_nlr_2 partial (pseudogene) 400 3619 - 17,16,1,6,4,5,10,3,12,2,8,7

instance 3: sample LA4118_N02 - two complete genes LA4118_N02_opera_scaffold_459_pilon LA4118_N02_opera_scaffold_459_pilon_nlr_1 complete 554 3157 + 1,6,4,5,10,3,2,8,7,9,11 LA4118_N02_opera_scaffold_459_pilon LA4118_N02_opera_scaffold_459_pilon_nlr_2 complete 1219 3157 + 1,6,4,5,10,3,2,8,7,9,11

instance 4: sample LA4118_N02 - 2 complete genes of exact size but named _nlr_1 and _nlr_2 resp. LA4118_N02_opera_scaffold_855_pilon LA4118_N02_opera_scaffold_855_pilon_nlr_1 complete 198 2659 - 1,6,4,5,10,3,12,2,8,7,9,11 LA4118_N02_opera_scaffold_855_pilon LA4118_N02_opera_scaffold_855_pilon_nlr_2 complete 198 2659 - 1,6,4,5,10,3,12,2,8,7,9,11

Is it that the genes detected are of biological relevance i.e. in the same region the tool can find two genes of different lengths in instances 1,2,3 ? or the same region has the motifs of two different genes that's why it is giving such results ?? or is it a technical issue?

But, in instance 4, why does it name the same gene region and same gene twice. How is the 7th column outputted ?

The command line was - java -jar $NLRAnnotator_dir/ChopSequence.jar -i $denovoref -o ${file}.chop.fasta -l 20000 -p 5000 meme_dir=/data/home/students/s.hembrom/meme-5.1.0 java -jar $NLRAnnotator_dir/NLR-Parser3.jar -y $meme_dir/bin/mast -x $NLRAnnotator_dir/meme.xml -i ${file}.chop.fasta -c ${file}.nlr.xml java -jar $NLRAnnotator_dir/NLR-Annotator.jar -i ${file}.nlr.xml -o ${file}.nlr.txt -g ${file}.nlr.gff -b ${file}.nlr.bed -m ${file}.nlr.motifs.bed -a ${file}.nbarkmotifalignment.fasta -f $denovoref ${file}.nlr.fasta 120 flanking

Thanks a lot for any help.

steuernb commented 4 years ago

Hi, just to clarify, NLR-Annotator is not able to identify genes. What it detects are loci in genomic sequence that have signatures of NLRs. Without rna-seq support, it cannot be determined if such a locus overlaps with a functional NLR gene. NLR-Annotator will firstly pick up signatures of the NB-ARC domain, then looks for motifs of LRRs or TIR/CC to extend the locus. In your case, two distinct NB-ARC domain seeds have been found, which are very close together. For both of them, the same same motif was then found that is close enough and would fit. This means, NLR-Annotator can associate the sequence with NLRs but cannot determine the structure properly. It could be that one of the NB-ARCs is an integrated domain, it could be one of them is a distinct locus, it might be that the NB-ARC domain is similar to RGA2. You will find out with manual annotation. I don't know any tool out there that could figure this out. I am happy to have a look at and discuss a few of your sequences if it is important for you to know the exact structure. All the best Burkhard

SuryaHembrom commented 4 years ago

Hi, Thanks for the clarification. I am using protein sequences to align to the possible NLR regions detected by the NLR annotator using other software like exonerate. So, it's working fine.

Regards.