mothur / mothur

Welcome to the mothur project, initiated by Dr. Patrick Schloss and his software development team in the Department of Microbiology & Immunology at The University of Michigan. This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
www.mothur.org
GNU General Public License v3.0
261 stars 110 forks source link

how to compile the cpp of mothur Source code commandand and generate it into exe #295

Closed ghost closed 7 years ago

ghost commented 7 years ago

how to compile the cpp of mothur Source code commandand and generate it into exe what I want to do is first compile the cpp of mothur command and generate in into exe and use the exe to deal with the High throughput data. Then change the code of mothur command to deal with the data in different ways

mothur-westcott commented 7 years ago

You can compile mothur using the makefile provided with the source code using g++. You will need to set the parameters to indicate your system requirements, and if you would like to be able to read *.gz files in the make.contigs command you will need to install boost.

In the makefile, these are the parameters you need to set:

64BIT_VERSION ?= yes OPTIMIZE ?= yes USEREADLINE ?= yes USEBOOST ?= yes BOOST_LIBRARY_DIR="\"Enter_your_boost_library_path_here\"" BOOST_INCLUDE_DIR="\"Enter_your_boost_include_path_here\"" MOTHUR_FILES="\"Enter_your_default_path_here\"" RELEASE_DATE = "\"8/9/2016\"" VERSION = "\"1.38.1\""

If you do not want to install or use boost then: USEBOOST ?= no and you do not have to set the location of the boost library or include directories.

The MOTHUR_FILES parameter is also optional, but allows you to set a default location for mothur to look for files it can't find. This is often used for reference files you want to store in one location separate from your data.

prehensilecode commented 7 years ago

Re MOTHUR_FILES - these are data files, rather than other scripts or executables? I'm just a sysadmin trying to compile this for some users.

May I suggest adding your comments above to the Makefile.

mothur-westcott commented 7 years ago

The MOTHUR_FILES option is used to set a compile time location for mothur to look for input files. It is optional. It is most often used when reference files are stored in one central location. For example, you could set this location to:

MOTHUR_FILES=""/usr/local/bin/mothur_reference_files""

When mothur is run if a file can't be found mothur will check this default location.

mothur > align.seqs(fasta=./mydataset/myfasta, reference=silva.v4.fasta)

If /usr/local/bin/mothur_reference_files/ contains the silva.v4.fasta file, then mothur will find it without the user having to specify the path.

I will add some comments to the makefile.

mothur-westcott commented 7 years ago

Updates included in version 1.39.5 https://github.com/mothur/mothur/releases/tag/v1.39.5

VladimirAv commented 6 years ago

Dear miss Westcott,

I'm having huge troubles compiling the source code of mother 1.39.5 you posted on here. I have no major informatics skills but i tried to do 'make' on terminal while being on the same directory of mothur folder (on linux) but i keep getting this:

g++ -Lsource/ -Lsource/calculators/ -Lsource/chimera/ -Lsource/classifier/ -Lsource/clearcut/ -Lsource/commands/ -Lsource/communitytype/ -Lsource/datastructures/ -Lsource/metastats/ -Lsource/randomforest/ -Lsource/read/ -Lsource/svm/ -o mothur source/gotohoverlap.o source/sharedutilities.o source/validparameter.o source/opticluster.o source/heatmap.o source/nast.o source/completelinkage.o source/randomnumber.o source/mothurout.o source/zlib.o source/averagelinkage.o source/seqnoise.o source/venn.o source/inputdata.o source/calcsparcc.o source/clusterclassic.o source/vsearchfileparser.o source/slibshuff.o source/collect.o source/trialSwap2.o source/nastreport.o source/fileoutput.o source/cluster.o source/gzip.o source/refchimeratest.o source/wilcox.o source/needlemanoverlap.o source/myseqdist.o source/subsample.o source/commandfactory.o source/rarecalc.o source/rarefact.o source/weightedlinkage.o source/progress.o source/linearalgebra.o source/singlelinkage.o source/trimoligos.o source/validcalculator.o source/optionparser.o source/mothur.o source/engine.o source/consensus.o source/dlibshuff.o source/heatmapsim.o source/raredisplay.o source/commandoptionparser.o source/libshuff.o source/noalign.o source/overlap.o source/calculators/sharedlennon.o source/calculators/sharedochiai.o source/calculators/sharedthetayc.o source/calculators/coverage.o source/calculators/sharedmorisitahorn.o source/calculators/sharedjest.o source/calculators/sharedthetan.o source/calculators/bergerparker.o source/calculators/whittaker.o source/calculators/qstat.o source/calculators/shen.o source/calculators/sharedjclass.o source/calculators/canberra.o source/calculators/structkulczynski.o source/calculators/sharedace.o source/calculators/parsimony.o source/calculators/smithwilson.o source/calculators/structeuclidean.o source/calculators/manhattan.o source/calculators/simpson.o source/calculators/heip.o source/calculators/mempearson.o source/calculators/logsd.o source/calculators/calculator.o source/calculators/sharedkstest.o source/calculators/memchord.o source/calculators/structchi2.o source/calculators/hellinger.o source/calculators/geom.o source/calculators/jackknife.o source/calculators/sharedrjsd.o source/calculators/bootstrap.o source/calculators/hamming.o source/calculators/invsimpson.o source/calculators/sharedsorclass.o source/calculators/ace.o source/calculators/boneh.o source/calculators/shannonrange.o source/calculators/npshannon.o source/calculators/unweighted.o source/calculators/chao1.o source/calculators/simpsoneven.o source/calculators/memchi2.o source/calculators/sharedchao1.o source/calculators/efron.o source/calculators/sharedsobscollectsummary.o source/calculators/spearman.o source/calculators/shannon.o source/calculators/structchord.o source/calculators/shannoneven.o source/calculators/bstick.o source/calculators/solow.o source/calculators/sharedanderbergs.o source/calculators/goodscoverage.o source/calculators/odum.o source/calculators/sharedjsd.o source/calculators/uvest.o source/calculators/memeuclidean.o source/calculators/sharedkulczynski.o source/calculators/sharedjackknife.o source/calculators/sharedjabund.o source/calculators/sharedsorest.o source/calculators/speciesprofile.o source/calculators/soergel.o source/calculators/sharedkulczynskicody.o source/calculators/sharedmarczewski.o source/calculators/sharedsorabund.o source/calculators/structpearson.o source/calculators/weighted.o source/calculators/sharedbraycurtis.o source/calculators/gower.o source/calculators/prng.o source/calculators/sharedsobs.o source/chimera/pintail.o source/chimera/myPerseus.o source/chimera/bellerophon.o source/chimera/maligner.o source/chimera/chimeraslayer.o source/chimera/slayer.o source/chimera/ccode.o source/chimera/chimerarealigner.o source/chimera/chimeracheckrdp.o source/chimera/mothurchimera.o source/chimera/decalc.o source/classifier/classify.o source/classifier/phylosummary.o source/classifier/taxonomyequalizer.o source/classifier/kmernode.o source/classifier/phylotree.o source/classifier/aligntree.o source/classifier/kmertree.o source/classifier/knn.o source/classifier/alignnode.o source/classifier/bayesian.o source/classifier/taxonomynode.o source/clearcut/getopt_long.o source/clearcut/clearcut.o source/clearcut/cmdargs.o source/clearcut/distclearcut.o source/clearcut/fasta.o source/clearcut/dmat.o source/commands/chimeravsearchcommand.o source/commands/sensspeccommand.o source/commands/renameseqscommand.o source/commands/distancecommand.o source/commands/listseqscommand.o source/commands/quitcommand.o source/commands/matrixoutputcommand.o source/commands/indicatorcommand.o source/commands/libshuffcommand.o source/commands/treegroupscommand.o source/commands/kruskalwalliscommand.o source/commands/getotulabelscommand.o source/commands/removeseqscommand.o source/commands/venncommand.o source/commands/makegroupcommand.o source/commands/getcoremicrobiomecommand.o source/commands/unifracunweightedcommand.o source/commands/makefilecommand.o source/commands/getgroupscommand.o source/commands/sracommand.o source/commands/setlogfilecommand.o source/commands/heatmapcommand.o source/commands/summaryqualcommand.o source/commands/screenseqscommand.o source/commands/clusterdoturcommand.o source/commands/removedistscommand.o source/commands/mimarksattributescommand.o source/commands/sffmultiplecommand.o source/commands/removerarecommand.o source/commands/parselistscommand.o source/commands/getlistcountcommand.o source/commands/mergefilecommand.o source/commands/countgroupscommand.o source/commands/heatmapsimcommand.o source/commands/clustercommand.o source/commands/degapseqscommand.o source/commands/summarytaxcommand.o source/commands/systemcommand.o source/commands/seqerrorcommand.o source/commands/normalizesharedcommand.o source/commands/aligncommand.o source/commands/helpcommand.o source/commands/otuhierarchycommand.o source/commands/mergetaxsummarycommand.o source/commands/chimerabellerophoncommand.o source/commands/getrelabundcommand.o source/commands/biominfocommand.o source/commands/filterseqscommand.o source/commands/removelineagecommand.o source/commands/getsharedotucommand.o source/commands/parsefastaqcommand.o source/commands/cooccurrencecommand.o source/commands/lefsecommand.o source/commands/getcurrentcommand.o source/commands/parsimonycommand.o source/commands/chimeraperseuscommand.o source/commands/mgclustercommand.o source/commands/splitabundcommand.o source/commands/primerdesigncommand.o source/commands/getrabundcommand.o source/commands/catchallcommand.o source/commands/pcrseqscommand.o source/commands/mantelcommand.o source/commands/binsequencecommand.o source/commands/makecontigscommand.o source/commands/makelookupcommand.o source/commands/rarefactcommand.o source/commands/summarysharedcommand.o source/commands/getlineagecommand.o source/commands/trimseqscommand.o source/commands/nmdscommand.o source/commands/collectsharedcommand.o source/commands/removeotulabelscommand.o source/commands/makebiomcommand.o source/commands/phylodiversitycommand.o source/commands/makelefsecommand.o source/commands/filtersharedcommand.o source/commands/pcoacommand.o source/commands/removegroupscommand.o source/commands/listotulabelscommand.o source/commands/rarefactsharedcommand.o source/commands/mergesfffilecommand.o source/commands/secondarystructurecommand.o source/commands/clearcutcommand.o source/commands/getdistscommand.o source/commands/pcacommand.o source/commands/metastatscommand.o source/commands/subsamplecommand.o source/commands/chimeraslayercommand.o source/commands/shhhercommand.o source/commands/splitgroupscommand.o source/commands/getlabelcommand.o source/commands/consensusseqscommand.o source/commands/sffinfocommand.o source/commands/corraxescommand.o source/commands/preclustercommand.o source/commands/setdircommand.o source/commands/unifracweightedcommand.o source/commands/classifytreecommand.o source/commands/getseqscommand.o source/commands/chimerapintailcommand.o source/commands/deuniqueseqscommand.o source/commands/getmimarkspackagecommand.o source/commands/setcurrentcommand.o source/commands/getmetacommunitycommand.o source/commands/sharedcommand.o source/commands/getcommandinfocommand.o source/commands/classifyrfsharedcommand.o source/commands/makefastqcommand.o source/commands/reversecommand.o source/commands/classifyseqscommand.o source/commands/anosimcommand.o source/commands/deuniquetreecommand.o source/commands/trimflowscommand.o source/commands/collectcommand.o source/commands/createdatabasecommand.o source/commands/sortseqscommand.o source/commands/otuassociationcommand.o source/commands/newcommandtemplate.o source/commands/setseedcommand.o source/commands/mergecountcommand.o source/commands/amovacommand.o source/commands/getsabundcommand.o source/commands/classifysvmsharedcommand.o source/commands/countseqscommand.o source/commands/nocommands.o source/commands/renamefilecommand.o source/commands/getgroupcommand.o source/commands/chimeracheckcommand.o source/commands/clustersplitcommand.o source/commands/mergegroupscommand.o source/commands/deconvolutecommand.o source/commands/getoturepcommand.o source/commands/sparcccommand.o source/commands/chimerauchimecommand.o source/commands/phylotypecommand.o source/commands/classifyotucommand.o source/commands/pairwiseseqscommand.o source/commands/clusterfragmentscommand.o source/commands/summarycommand.o source/commands/chopseqscommand.o source/commands/seqsummarycommand.o source/commands/shhhseqscommand.o source/commands/homovacommand.o source/commands/chimeraccodecommand.o source/communitytype/communitytype.o source/communitytype/pam.o source/communitytype/kmeans.o source/communitytype/qFinderDMM.o source/datastructures/listvector.o source/datastructures/fastqread.o source/datastructures/ordervector.o source/datastructures/kmer.o source/datastructures/sharedlistvector.o source/datastructures/optimatrix.o source/datastructures/fastamap.o source/datastructures/distancedb.o source/datastructures/alignmentcell.o source/datastructures/sharedordervector.o source/datastructures/nameassignment.o source/datastructures/suffixnodes.o source/datastructures/flowdata.o source/datastructures/tree.o source/datastructures/sparsedistancematrix.o source/datastructures/sequence.o source/datastructures/kmeralign.o source/datastructures/sparsematrix.o source/datastructures/alignmentdb.o source/datastructures/fullmatrix.o source/datastructures/blastalign.o source/datastructures/suffixdb.o source/datastructures/sabundvector.o source/datastructures/rabundvector.o source/datastructures/suffixtree.o source/datastructures/treenode.o source/datastructures/qualityscores.o source/datastructures/groupmap.o source/datastructures/sequenceparser.o source/datastructures/reportfile.o source/datastructures/sequencecountparser.o source/datastructures/oligos.o source/datastructures/counttable.o source/datastructures/treemap.o source/datastructures/kmerdb.o source/datastructures/blastdb.o source/datastructures/sharedrabundvector.o source/datastructures/sharedrabundfloatvector.o source/datastructures/sequencedb.o source/datastructures/database.o source/datastructures/designmap.o source/datastructures/sharedsabundvector.o source/datastructures/alignment.o source/metastats/mothurfisher.o source/metastats/mothurmetastats.o source/randomforest/regularizeddecisiontree.o source/randomforest/rftreenode.o source/randomforest/decisiontree.o source/randomforest/abstractdecisiontree.o source/randomforest/randomforest.o source/randomforest/forest.o source/read/splitmatrix.o source/read/readphylipvector.o source/read/readblast.o source/read/readcolumn.o source/read/readcluster.o source/read/readtree.o source/read/formatphylip.o source/read/treereader.o source/read/readphylip.o source/read/formatcolumn.o source/svm/svm.o -lreadline source/zlib.o: In functionboost::iostreams::detail::zlib_base::after(char const&, char&, bool)': zlib.cpp:(.text+0x148): undefined reference to crc32' source/zlib.o: In functionboost::iostreams::detail::zlib_base::reset(bool, bool)': zlib.cpp:(.text+0x1c1): undefined reference to deflateReset' zlib.cpp:(.text+0x1d6): undefined reference toinflateEnd' zlib.cpp:(.text+0x1e9): undefined reference to inflateReset' zlib.cpp:(.text+0x201): undefined reference todeflateEnd' source/zlib.o: In function boost::iostreams::detail::zlib_base::do_init(boost::iostreams::zlib_params const&, bool, void* (*)(void*, unsigned int, unsigned int), void (*)(void*, void*), void*)': zlib.cpp:(.text+0x30f): undefined reference toinflateInit2' zlib.cpp:(.text+0x377): undefined reference to `deflateInit2' source/zlib.o: In function boost::iostreams::detail::zlib_base::xdeflate(int)': zlib.cpp:(.text+0x194): undefined reference todeflate' source/zlib.o: In function boost::iostreams::detail::zlib_base::xinflate(int)': zlib.cpp:(.text+0x1a4): undefined reference toinflate' collect2: error: ld returned 1 exit status Makefile:79: recipe for target 'mothur' failed make: *** [mothur] Error 1 `

I have no idea what would be the result on compiling this, but i just want to use that patch with updates to correct the error i keep receiving on classify.seqs (xxxxx could not be classified... and i get all of my sequences with "unkown" taxonomy on the taxonomy file). PD: im getting that error with mothur 1.40.0 on Windows. Any suggestions?

Thank you in advance!

mothur-westcott commented 6 years ago

It looks like the compiler is unable to find the zlib files needed by boost. I have attached them for you, but I suspect your classification issues are unrelated to the version of mothur you are running.

What references are you using for the classification?

Did you filter and screen the reads to ensure high quality sequences?

gzip-zlib.zip

VladimirAv commented 6 years ago

Im using as reference the .aln file (the complete alignment of the fasta files of my target organism). And yes, i did filter the sequences and screen them, actually the classify.seqs is like my 18th step of my protocol.

Thank you in advance

mothur-westcott commented 6 years ago

Could you post a sampling of your sequences and reference files so I can take a closer look?

VladimirAv commented 6 years ago

Yes of course, i'll send you my .aln file (for reference), my .tax file (for taxonomy) and a pair of sequences as .fastq extension. Thank you! (4 files in total). Please let me know what else you need.

dinoflagelados.zip eukseqs.zip

mothur-westcott commented 6 years ago

Thanks for sending your files. The issue seems to be related to your reference files. When I use silva.seed_v128 as the reference, the *tax.summary looks like:

taxlevel rankID taxon daughterlevels total 0 0 Root 2 50653 1 0.1 Eukaryota 7 50499 2 0.1.1 Arthropoda 2 8080 3 0.1.1.1 Arthropoda_unclassified 1 3046 4 0.1.1.1.1 Arthropoda_unclassified 1 3046 5 0.1.1.1.1.1 Arthropoda_unclassified 1 3046 6 0.1.1.1.1.1.1 Arthropoda_unclassified 0 3046 3 0.1.1.2 Maxillopoda 1 5034 4 0.1.1.2.1 Maxillopoda_or 1 5034 5 0.1.1.2.1.1 Maxillopoda_fa 1 5034 6 0.1.1.2.1.1.1 Maxillopoda_ge 0 5034 2 0.1.2 Basidiomycota 1 1 3 0.1.2.1 Exobasidiomycetes 1 1 4 0.1.2.1.1 Malasseziales 1 1 5 0.1.2.1.1.1 Incertae_Sedis 1 1 6 0.1.2.1.1.1.1 Malassezia 0 1 2 0.1.3 Chlorophyta_ph 1 3 3 0.1.3.1 Chlorodendrophyceae 1 3 4 0.1.3.1.1 Chlorodendrales 1 3 5 0.1.3.1.1.1 Chlorodendrales_fa 1 3 6 0.1.3.1.1.1.1 Chlorodendrales_fa_unclassified 0 3 2 0.1.4 Ciliophora 1 2995 3 0.1.4.1 Intramacronucleata 1 2995 4 0.1.4.1.1 Spirotrichea 1 2995 5 0.1.4.1.1.1 Hypotrichia 2 2995 6 0.1.4.1.1.1.1 Hypotrichia_unclassified 0 2990 6 0.1.4.1.1.1.2 Pseudouroleptus 0 5 2 0.1.5 Dinoflagellata 1 12535 3 0.1.5.1 Dinophyceae 3 12535 4 0.1.5.1.1 Dinophyceae_unclassified 1 11041 5 0.1.5.1.1.1 Dinophyceae_unclassified 1 11041 6 0.1.5.1.1.1.1 Dinophyceae_unclassified 0 11041 4 0.1.5.1.2 Gymnodiniphycidae 2 1470 5 0.1.5.1.2.1 Gymnodiniphycidae_unclassified 1 1 6 0.1.5.1.2.1.1 Gymnodiniphycidae_unclassified 0 1 5 0.1.5.1.2.2 Gymnodinium_clade 1 1469 6 0.1.5.1.2.2.1 Gymnodinium 0 1469 4 0.1.5.1.3 Peridiniphycidae 2 24 5 0.1.5.1.3.1 Gonyaulacales 1 3 6 0.1.5.1.3.1.1 Gonyaulacales_unclassified 0 3 5 0.1.5.1.3.2 Peridiniphycidae_unclassified 1 21 6 0.1.5.1.3.2.1 Peridiniphycidae_unclassified 0 21 2 0.1.6 Eukaryota_unclassified 1 26124 3 0.1.6.1 Eukaryota_unclassified 1 26124 4 0.1.6.1.1 Eukaryota_unclassified 1 26124 5 0.1.6.1.1.1 Eukaryota_unclassified 1 26124 6 0.1.6.1.1.1.1 Eukaryota_unclassified 0 26124 2 0.1.7 Vertebrata 1 761 3 0.1.7.1 Mammalia 1 761 4 0.1.7.1.1 Mammalia_or 1 761 5 0.1.7.1.1.1 Mammalia_fa 1 761 6 0.1.7.1.1.1.1 Mammalia_ge 0 761 1 0.2 unknown 1 154 2 0.2.1 unknown_unclassified 1 154 3 0.2.1.1 unknown_unclassified 1 154 4 0.2.1.1.1 unknown_unclassified 1 154 5 0.2.1.1.1.1 unknown_unclassified 1 154 6 0.2.1.1.1.1.1 unknown_unclassified 0 154

VladimirAv commented 6 years ago

I know! it is possible to classify it with silva database but with this one isn't and i'm following the same format. I just modified it and deleted all the "18S" or "clone" or "gene" words of the taxonomy file and just left the taxonomy names on it and then copy it on the same order to the align file.

However i keep getting the same thing "sequence xxx could not be classified". It's so weird, i really don't know what to do.

mothur-westcott commented 6 years ago

The quality of the classification is dependent on the quality of the reference. When I run summary.seqs on your references, I am seeing this:

    Start   End NBases  Ambigs  Polymer NumSeqs

Minimum: 1 108 108 0 3 1 2.5%-tile: 1 297 297 0 4 202 25%-tile: 1 495 495 0 4 2015 Median: 1 788 788 0 6 4030 75%-tile: 1 1199 1199 0 6 6045 97.5%-tile: 1 1777 1777 2 8 7858 Maximum: 1 1896 1896 200 100 8059 Mean: 1 884 884 0 5

of Seqs: 8059

The silva reference looks like:

Using 4 processors.

    Start   End NBases  Ambigs  Polymer NumSeqs

Minimum: 1 1358 1358 0 4 1 2.5%-tile: 1 1398 1398 0 5 281 25%-tile: 1 1443 1443 0 5 2804 Median: 1 1461 1461 0 6 5607 75%-tile: 1 1489 1489 0 6 8410 97.5%-tile: 1 1744 1744 4 7 10933 Maximum: 1 2837 2837 5 12 11213 Mean: 1 1515 1515 0 5

of Seqs: 11213

VladimirAv commented 6 years ago

Look, what i did it was the following...

I screen.seqs the data base and remove the sequences with the high number of ambig and homop and from the 8059, i finally got like 7500 sequences. With that archive, i modified the taxonomy file so they both got the same number of lines. I did a summary on that final archive (aln file) and showed me this:

Start End NBases Ambigs Polymer NumSeqs Minimum: 1 1401 108 0 3 1 2.5%-tile: 109 1969 294 0 4 190 25%-tile: 383 3624 451 0 4 1900 Median: 1132 5104 757 0 6 3799 75%-tile: 1991 6820 1177 0 6 5698 97.5%-tile: 2926 7054 1744 1 7 7408 Maximum: 7062 7706 1777 2 8 7597 Mean: 1221 5153 850 0 5

of Seqs: 7597

At the moment of classify.seqs, the command is giving me the same error for the sequences. PD. i'm so sorry for keeping with this issue but i really don't know what to do, thank you so much for the patience.

VladimirAv commented 6 years ago

Could it be a typing error, i don't know, i'm gonna guess, but maybe mothur recognizes the complete taxonomy from domain till species and it needs to have words between the ";"?, or do you think it is just an error of bases on the alignment?

mothur-westcott commented 6 years ago

Mothur sets no requirements for the length of taxonomies or taxon names. The taxonomy reference just needs to be tab separated with taxons distinguished by ';'.

The classify.seqs command looks at the kmers in the references and calculates the probability a given kmer will be present in a given classification. Mothur then finds the most probable classification using the kmers in the query sequence. I'm afraid I can't be of more help.

VladimirAv commented 6 years ago

I'm sorry for being persistent but actually i managed mothur to run the command with no errors on it! it needed more taxonomic orders (i just had 3, so i added like 4 more with the name 'unknown') and mothur ran the command entirely but the taxonomy file is full with "unknown" (exactly like if the command would have crash).

EDIT: I got new information on my taxonomy file!!! (after classifying) i just dropped the cutoff from 97 to 95 and im seeing some taxa on the rows, i mean new taxa besides "unknown". I'm gonna try with different cutoffs. At least the mothur error is gone. Thank you for the suggestions anyway!