Closed marlt closed 4 years ago
https://www.uniprot.org/downloads#uniprotkblink trembl uniref
Unref100/90/50, differ by several GB
Since uniref50 did not show the desired extension of prokka hyprots, is the rcsb protein DB worth a try? Also, we could try a different test genome to see not only results on chlamydia ...
Another idea: make the sensitivity of mmseqs adjustable - default is 5.7, it can be customized between 1.0 (fast and less sensitive) to 7.5
@rcsb db: I don't know this database. The question is, how easy can you switch to another FASTA format?
@test genome: also a good idea, we are just testing Chlamydia. Another Bacteria would be also a good try. Maybe Mycoplasma bovis? But attention, here the gene code for prokka must be switched from 1 to 4. Or Mycobacterium paratuberculosis. You can download a genome we assembled years ago here: https://www.rna.uni-jena.de/supplements/mycobacterium/genomes/MycAviPar386.final.fasta
I think the M. para would be a good test case.
@sensitiviy 5.7 - what does this number tell me? I have the feeling we can simply go with high sensitivity and set this to 7.5. but maybe this will not change much
I tried to alter the sensitivity to sth else than 5.7 - always crashing with segfault - I left this out for now, since it isnt so promesing anyway... The pdb is a protein structure database, using sequences from uniprot, ncbi gen, pfam, ExPasy, pubmed and SCOP. It is curated and fasta sequences available - homepage says curated...and the fasta is small, just 30MB.
SO uniref50 functions properly now I would say. It gives some enhancement to the hyprots compared to uniprotkb. pdb is still included, but gives not really more information.
We leave this issue open for later reference. At the moment having the uniprot-swissprot is fine.