Closed MinjieHu closed 6 years ago
Would you be able to share the log files so I can see th entire log for predict as well as braker?
Of course. braker.log funannotate-p2g.log funannotate-predict.log
Looks like you are running v1.0.0, can you upgrade to the newest version as quite a bit has changed especially with the RNA-seq modules.
Thanks for the reply. I will try with the 1.3.3 version.
By the way, is it ok to predict with STAR aliment based Bam file?
Yeah I think it should be okay with star although running PASA is also quite helpful. You can also run that separately and pass the transdecoder filtered PASA models to the predict script.
Great. Predict works now. Thanks for the help. But for train, the progress from last night is 2.51%, and right now, it's still 2.51%
Any clues in the trinity log file? Should be in the 'training/trinity_gg.log` file.
It just shows "All commands completed successfully. :-)". And it already produced a bam file "hisat2.coordSorted.bam" with 2.2 G size. While in my STAR based alignment, the sorted bam file size is 3.2G. I have no idea whether it is completed or not. I paste all the log files as following. funannotate-train.log Trinity-gg.log nohup.log
I'm not sure, but it doesn't look like Trinity is running? Seems like the log file is just full of this error:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "zh_CN.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Setting locale failed.
perl: warning: Setting locale failed.
perl: warning: Setting locale failed.
Perhaps addressing that error will allow it to run.
You are right. When I look into the detail of the Trinity-gg.log, before the perl warning appeared, it shows
###################################################################
## Stopping here due to --no_distributed_trinity_exec in effect ##
###################################################################
But I still don't know how to deal with it.
Shouldn't it just be setting the correct environmental variable? https://stackoverflow.com/questions/2499794/how-to-fix-a-locale-setting-warning-from-perl
By the way, I tried to just skip the update step, and went to the annotation step. It can successful finished the annotation step. But there's a lot of genes without typical gene name. Do you only assign a gene name when it have a high confidence blast hit?
Did you run EggNog mapper as well? Many if not most genes won't have names/product descriptions - it will pull eggnog mapper names as well as UniProt/Swissprot (60% pident over 60% percent of the protein) - so it is designed to be conservative.
I fixed the warning of perl. But when I run update, the problem is still there.
Trinity version: Trinity-v2.5.1
Tuesday, May 29, 2018: 17:55:10 CMD: /mnt/sequence/mhu2/miniconda2/opt/trinity-2.5.1/util/support_scripts/ensure_coord_sorted_sam.pl funannotate/update_misc/hisat2.coordSorted.bam
** NOTE: Latest version of Trinity is Trinity-v2.6.6, and can be obtained at:
https://github.com/trinityrnaseq/trinityrnaseq/releases
-appears to be a coordinate sorted bam file. ok.
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
Tuesday, May 29, 2018: 17:55:10 CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /mnt/sequence/mhu2/miniconda2/opt/trinity-2.5.1/util/support_scripts/ExitTester.jar 0
Tuesday, May 29, 2018: 17:55:11 CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /mnt/sequence/mhu2/miniconda2/opt/trinity-2.5.1/util/support_scripts/ExitTester.jar 1
----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads ---------------------
----------------------------------------------------------------------------------
Tuesday, May 29, 2018: 17:55:11 CMD: samtools index /mnt/sequence/mhu2/git/mhu2-xelongata/nanopore/final/polished/funannotate/update_misc/hisat2.coordSorted.bam
###################################################################
## Stopping here due to --no_distributed_trinity_exec in effect ##
###################################################################
All commands completed successfully. :-)
All commands completed successfully. :-)
All commands completed successfully. :-)
All commands completed successfully. :-)
All commands completed successfully. :-)
Based on the manual, if I installed EggNog mapper, it will automatically run, is it? I think I already installed EggNog mapper. I can run emapper.py directly from my bash.
What version of samtools are you using, seems like that is also causing an error in Trinity.
Per eggnog, yes as long as emapper.py is in the $PATH then it should run the analysis during funannotate annotate
.
Samtools Version: 1.7
Okay, well I think version of Trinity < 2.6 use a very old version of samtools packaged with Trinity, while newer versions use your system samtools, so you may want to look on the Trinity page for help on your installation -- i.e. ensure that you can run the Trinity sample data, etc. I think the most recent version is like 2.6.6 so you might consider upgrading.
You are great! When I update Trinity, it works. Unfortunately, it failed at PASA step. it reported can't find PASA config.txt file. And I also look into the template config file, it seems to config mysql. I tried to install and config mysql. But it's hard for me to configure without root privilege. Is mysql indeed necessary, or I can just use sqlite instead? Thanks again for the great help!
In funannotate v1.3.0 and newer, the default will try to use SQLite, in fact you have to specify --pasa_db mysql
for it to run mysql (and I see that the menu has not been updated to reflect this). You will also need to have the most recent version of PASA for it to be able to use SQLite. Why don't you move into the PASA install directory and run the packaged tests - there is both a test for SQLite and one for MySQL.
Ok, I will give a try. By the way, Is it fine just copy the PASA template config file as the config.txt file?
Its only needed if you are using MySQL -- see here https://github.com/PASApipeline/PASApipeline/wiki/Pasa_installation_instructions
Finally, I succeed in running update and annotate. And in the annotate step, it indeed run Eggnog-mapper, but it finished within less than 1 minute. I only get 442 gene/product names passed.
[07:07 PM]: 1,204 valid gene/product annotations from 1,626 total
[07:07 PM]: Running Eggnog-mapper
[07:07 PM]: No Eggnog-mapper results found.
The version I am using is from anaconda. It should be version 1.03 according to anaconda. But when I runemapper.py --version
it shows emapper-a9fda72
There is a uniprot_eggnog_raw_names.txt which has 1204 lines in annotate_misc folder. It seems eggnog indeed ran.
That's not the output of eggnog -- its a raw summary of the gene name/products that were parsed (so this file is created even if eggnog isn't run).
The problem is likely the diamond database - the version of diamond database distributed with eggnog-mapper is created with an old version of diamond and isn't compatible (v0.8x is not compatible with v0.9x). You can fix this by following the:
#navigate to the eggnog-mapper/data folder
#extract protein fasta files and then re-construct diamond database
mv eggnog_proteins.dmnd eggnog_proteins_old.dmnd
diamond getseq --db eggnog_proteins_old.dmnd | diamond makedb --db eggnog_proteins.dmnd
rm eggnog_proteins_old.dmnd
Also the funannotate-annotate.log
files should have more information about potentially what the error was. A successful run in the log file looks like this:
06/01/18 11:16:33]: emapper.py -m diamond -i /Users/jon/funannotate/sample_data/genome3/annotate_misc/genome.proteins.fasta -o eggnog --cpu 6
[06/01/18 11:46:31]: # emapper-1.0.3
# ./emapper.py -m diamond -i /Users/jon/funannotate/sample_data/genome3/annotate_misc/genome.proteins.fasta -o eggnog --cpu 6
[1;33m /Users/jon/miniconda2/bin/diamond blastp -d /Users/jon/software/eggnog-mapper/data/eggnog_proteins.dmnd -q /Users/jon/funannotate/sample_data/genome3/annotate_misc/genome.proteins.fasta --more-sensitive --threads 6 -e 0.001000 -o /Users/jon/funannotate/sample_data/genome3/annotate_misc/emappertmp_dmdn_Z7MUcn/b8a125cec30b411f911799dead6693cc --top 3[0m
[32mFunctional annotation of refined hits starts now[0m
[1;34m Processed queries:147 total_time:2.02165222168 rate:72.71 q/s[0m
[32mDone[0m
eggnog.emapper.seed_orthologs
eggnog.emapper.annotations
Total time: 1797.69 secs
And then you should have this output file: annotate_misc/eggnog.emapper.annotations
.
Thanks very much! It works now!
When I look into the detail of the annotation, I found there's only 3,185 gene/product names passed, while the total predicted protein coding gene number is 26,791. Is such a low fraction normal? Can I change some parameters to improve the passed gene names number?
I compared the eggnog mapper annotation with the final annotation, and found that a lot genes with eggnog name are assigned as hypothetical protein by the final annotation.
Yes -- many of those names are invalid and are filtered out. For example many are locus_tags from another organism, i.e.:
FG04299.1
FGSG_01554
PGUG_02518
.
.
These are not valid gene names (really they shouldn't be in EggNog - likely they are placeholders until a formal name is assigned). So the current criteria for filtering the gene names is: 1) cannot contain and underscore 2) cannot contain a period 3) at least 3 characters 4) one character has to be a number 5) but not more than 3 numbers
Some 'valid' names are probably dropped with this filtering -- but is necessary to remove all of the invalid names. As always, if somebody has a method to improve this that would be great.
Thanks for the quick responding. I understand your concern. But in my case right now, I need much more functional annotation for my downstream single cell RNA-seq analysis. So I try to skip your filter criteria by annotate the if line
if not '_' in cols[Genei] and not '.' in cols[Genei] and number_present(cols[Genei]):
But I still failed to get more gene name. Is there somewhere else I missed?
Gene names aren't really that important for functional annotation (at least in my opinion) - but assuming you also ran interproscan and the rest of the tools, you should have functional annotation for many proteins (if you don't it might mean that the prediction step didn't work as planned). For fungi (the organisms I work on) I would only typically except to get gene names for ~ 15-20% of the genes -- as 60-80% of most fungal genes are "hypothetical" and there isn't generally a known function.
You can also do some of this manually, i.e. if you had a closely related well-annotated genome, you could transfer gene names/annotation by identifying orthologs between the two genomes and then add your desired gene names using the -a, --annotations
option. You could also parse the eggnog mapper file if you wanted to and just need to generate a 3 column TSV file to pass to -a
.
Thanks for the suggestion. I have another question here. In the annotate_misc/all.annotations.txt file, I got ~15000 genes with name information, but only ~5000 name in the final annotation, and this difference seems come from the tbl2asn step, do you do some other filtration during this step?
closing this because there are several issues in here, if one/more issues arise please open a new ticket with a single issue per thread.
Dear Jon, After I succeed in run sample data, I still failed run my assembly. When I followed the tutorial to run
funannotate train -i xenia.contigs.fasta -o fun -s ../../../transcriptome_based_first3_genome/coral_RNA.fastq --species "Xenia" --cpus 18
It always stalked atESC[92m[01:50:15 PM]ESC[0m: Assembling 120,054 Trinity clusters using 17 CPUs, Progress: 10.20%
So I tried to provide bam file by using STAR to align my RNA-seq data to my assembly, and run funannotate predict. It still failed with logThere is no augustus-parallel.log in the logfile folder. And when I look in to the braker.log, I also didn't find obvious error message, and the last few lines are
And it's the same case for the gmes.log file. The last several lines are
Thanks for the help!