Closed sunshichao0916 closed 3 years ago
Hi,
Can you go to: https://github.com/pauline-ng/SIFT4G_Create_Genomic_DB#monitoring-the-database-creation-process
and run the commands above to check what's been created and what hasn't?
That's very odd that the protein .fa file is created but the database is not.
Thanks, Pauline
Hi, I rerun the perl make-SIFT-db-all.pl -config command according to the tutorial on GitHub. The following files were generated in the PARENT_DIR folder:
/chr-src/directory.index.dir /chr-src/directory.index.pag
/fasta/.fasta
/gene-annotation-src/noncoding.txt /gene-annotation-src/protein_coding_genes.txt
*/singleRecords/
*/subst
The following files were not generated:
Thank you, Sun Shichao
Hi Sun,
This looks correct. Just to double check, can you confirm your /singleRecords/ and /subst are not empty?
If those files are not empty, then everything looks right except for calling SIFT 4G algorithm. (The algorithm that actually makes the predictions) Can you confirm the path to the executable sift4g is correct? Also, when you run the test files, you're able to make predictions for SIFT 4G?
Thanks, Pauline
Hi Pauline,
First, I confirm my /singleRecords/ and /subst are not empty. Then, I ran the test file, but failed. Finally, I checked the executable sift4g file and found that it does not work properly. So, I reinstalled SIFT4g, but the following error is displayed during make. Is there any solution?
To trouble you many times, thank you Sun Shichao
Hi Sun,
Robert maintains the sift4g algorithm. Please post this issue (the error you wrote just above) on
https://github.com/rvaser/sift4g
and Robert will probably be able to help.
Best, Pauline
Hi Sun, the error indicates that your compiler does not support c++11 standard, try updating both gcc and g++ compilers.
Best regards, Robert
Hi Robert,
I upgraded the software, but the database still failed to built. The alignment fold was empty, and the log file is as follow:
When the test file is running, it also stops running in the alignments section.
Thanks !
Best regards, Sun Shichao
What is printed after the sift4g
command?
Hi Robert and Pauline , No error information was displayed, but the alignment command could not be continued. After two days of troubleshooting, I still couldn't find the problem.
The last print information of nohup.out file was shown in the following figure:
Thanks. Sun
Hi Sun,
Can you confirm all_prot.fasta file contains protein sequences?
Also -- you could run the test files OK?
Pauline
Hi Pauline,
The all_prot.fasta file contains protein sequences.
The test file cannot run normally, and it stops running in the alignments.
Thanks for your answers.
Sun
Hi Pauline and Robert,
Thank you for your patience. Unfortunately, due to personal reasons, it is still unable to solve this problem.
Can I invite you or a member of the sift4g team to build the SIFT database for me?
Sincerely, Sun
Hi Sun,
I might be able to ask a former post-doc to build it for you.
For academia, we ask that the person who builds the SIFT database be added as an author on the paper.
For industry, it's a service and a fee would be charged.
Thanks, Pauline
Hi Pauline,
We built the SIFT database for academic purposes and agreed to add the person who builds the SIFT database as an author on the paper. How can I contact you?
Thank you Pauline.
Sun
@sunshichao0916
I tried looking up your email address. Is it something like 3........33@qq.com ?
If yes, please check your inbox and respond to me, and we can get the ball rolling.
Thanks, Pauline
Updata:First, I removed the protein sequence containing "XXX", and then divided the large file into several separate files according to the chromosome number. When I ran the build data command again, the database was built successfully.
Updata:First, I removed the protein sequence containing "XXX", and then divided the large file into several separate files according to the chromosome number. When I ran the build data command again, the database was built successfully. Thanks a lot! 小姐姐
Great, thanks for figuring it out!
Great, thanks for figuring it out!
Hey! pauline It's fine when I divide the chromosomes into operation. But I don't understand the "PROTEIN_DB=". When I build a database for a species, do I have to use the proteome of this species, or is this option to select relative species of this species?
Thank you! dcf
Hi Dcf,
PROTEIN_DB should be a database of protein sequences like NCB redundant, SWISS-PROT/Trembl, etc. SIFT will search for homologous sequences from this database.
Thanks, Pauline
Hi, Pauline.I got it! Thank you very much!
Hi, SIFT4G team I had a problem in building my own database using sift4g tools. When execute the perl make-SIFT-db-all.pl -config Glymax_config.txt command, there will be in parentDir folder generates all prot.fasta file, but without generated a database and no error was return. I don't know where the problem is. Hope your answers, thank you.
My input files are shown below:
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=11 MITO_GENETIC_CODE_TABLENAME=Plant Plastid Code
PARENT_DIR=/vol3/agis/wangli_group/sunshichao/soybean/P101SC17040637-01-F004/SIFT4G/PARENT_DIR ORG=Glycine_max ORG_VERSION=Gma2.v1
Running SIFT 4G
SIFT4G_PATH=/vol3/agis/wangli_group/sunshichao/miniconda3/bin/sift4g PROTEIN_DB=/vol3/agis/wangli_group/sunshichao/soybean/P101SC17040637-01-F004/SIFT4G/database/uniref90.fasta
Sub-directories, don't need to change
LOGFILE=Log.txt ZLOGFILE=Log2.txt GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src FASTA_DIR=fasta SUBST_DIR=subst SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords/ SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP
Doesn't need to change
FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord