pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
22 stars 7 forks source link

error no such file or directory #90

Closed sagitaninta closed 8 months ago

sagitaninta commented 8 months ago

I have read this issue as it is exactly the same problem but no matter how I changed the parent directory, it keeps throwing me "No such file or directory" error I am so frustrated. I hope anyone can let me know where I did wrong it has been two days of no luck. This is just trying to run the test files.

So the config file looks like this:

GENE_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz
PEP_FILE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep/Candidatus_carsonella_ruddii_pv.ASM1036v1.pep.all.fa.gz
CHR_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/

GENETIC_CODE_TABLE=11
GENETIC_CODE_TABLENAME=11
MITO_GENETIC_CODE_TABLE=0
MITO_GENETIC_CODE_TABLENAME=Unspecified

PARENT_DIR=/home/mdn487/db_test #prev: /bigdrive/SIFT_databases//candidatus_carsonella_ruddii_pv
ORG=candidatus_carsonella_ruddii_pv
ORG_VERSION=ASM1036v1.34

#Running SIFT 4G
SIFT4G_PATH=/home/mdn487/bin/sift4g/bin/sift4g

# protein database must be uncompressed
PROTEIN_DB=/maps/projects/bos/people/mdn487/pur_sel//SIFT4G_Create_Genomic_DB/uniprot_sprot.fasta #prev: /bigdrive/SIFT_databases/uniprot_sprot.fasta

# Sub-directories, don't need to change
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
FASTA_DIR=fasta
SUBST_DIR=subst
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords/
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP

# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

And it looks like this when I ran the command. Screenshot 2024-01-11 at 11 26 18

Mind you that PARENT_DIR has been

/maps/projects/bos/people/mdn487/pur_sel//SIFT4G_Create_Genomic_DB/
/maps/projects/bos/people/mdn487/

And it still won't work.

ls /home/mdn487 results in

bin  db_test  ucph

So it has to be accessible for the program?

I am running out of ideas. Thanks in advance for everyone helping me!

pauline-ng commented 8 months ago

I don't think it should make a difference, but it should only be one slash

# Original
/maps/projects/bos/people/mdn487/pur_sel//SIFT4G_Create_Genomic_DB/
# New
/maps/projects/bos/people/mdn487/pur_sel/SIFT4G_Create_Genomic_DB/

When you type

ls /maps/projects/bos/people/mdn487/pur_sel//SIFT4G_Create_Genomic_DB/common-utils.pl

is the common-utils.pl file listed?

Also, I never tested it with a perl venv, is it possible for you to run it without a virtual environment.

sagitaninta commented 8 months ago

Much appreciated for the prompt response!

So I have changed the config to look like this now:

GENE_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz
PEP_FILE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep/Candidatus_carsonella_ruddii_pv.ASM1036v1.pep.all.fa.gz
CHR_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/

GENETIC_CODE_TABLE=11
GENETIC_CODE_TABLENAME=11
MITO_GENETIC_CODE_TABLE=0
MITO_GENETIC_CODE_TABLENAME=Unspecified

PARENT_DIR=/maps/projects/bos/people/mdn487/pur_sel/SIFT4G_Create_Genomic_DB #prev: /bigdrive/SIFT_databases//candidatus_carsonella_ruddii_pv
ORG=candidatus_carsonella_ruddii_pv
ORG_VERSION=ASM1036v1.34

#Running SIFT 4G
SIFT4G_PATH=/home/mdn487/bin/sift4g/bin/sift4g

# protein database must be uncompressed
PROTEIN_DB=/maps/projects/bos/people/mdn487/pur_sel/SIFT4G_Create_Genomic_DB/uniprot_sprot.fasta #prev: /bigdrive/SIFT_databases/uniprot_sprot.fasta

# Sub-directories, don't need to change
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
FASTA_DIR=fasta
SUBST_DIR=subst
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords/
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP

# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

And the results is still the same. Screenshot 2024-01-11 at 12 55 07 Adding and removing the double slash is among the things I've tried already. Besides, the error start at PARENT_DIR line.

ls /maps/projects/bos/people/mdn487/pur_sel//SIFT4G_Create_Genomic_DB/common-utils.pl looks like this: Screenshot 2024-01-11 at 12 59 12

So perl_env there means I used a conda environment to install perl dependencies listed here. Installing them without conda directly in my local is messy otherwise. The docker need system-level installation and my server only has udocker which unfortunately has entirely different syntax and I have a hard time running it.

sagitaninta commented 8 months ago

I GOT IT

The variable reading in config file STILL READS AFTER '#' so the PARENT_DIR there is not /maps/projects/bos/people/mdn487/pur_sel/SIFT4G_Create_Genomic_DB but /maps/projects/bos/people/mdn487/pur_sel/SIFT4G_Create_Genomic_DB #prev: /bigdrive/SIFT_databases//candidatus_carsonella_ruddii_pv

It now works! Just me forget to download the uniprot fasta and it got an error there.

Very sorry for my stupidity and I hope someone found this helpful.