johnlees / bioinformatics

Utilities written for bioinformatics
http://leesjohn.wordpress.com/
19 stars 6 forks source link

Error in alf_db_2_fasta.pl #1

Closed Morteza-M-Saber closed 6 years ago

Morteza-M-Saber commented 6 years ago

Using Perl version 5-18-2, running the code alf_db_2_fasta.pl gives the following error:

perl alf_db_2_fasta.pl se_ECOLI_core.db > coli_fasta.fa Use of uninitialized value $1 in subtraction (-) at alf_db_2_fasta.pl line 24, line 2.

johnlees commented 6 years ago

The pattern it's trying to match is

m/^>.+, sequence type: type\d+, locus: (\d+)$/

Can you see what the header lines (i.e. grep ">" se_ECOLI_core.db) look like to see why it's not matching?

Morteza-M-Saber commented 6 years ago

The header doesn't match this pattern. 'se_ECOLI_core.db' exist in 'realseed' folder of ALF package. Apparently the input file of ALF do not need to have the same header pattern as the output. Do you might know how different types of sequences should be determined in the input sequence so ALF would evolve them with different substitution matrices as determined by sequence types in parameter file?

johnlees commented 6 years ago

Ah right, it's probably specific to the starting sequence I used then (did I send you that, if you need it?) I vaguely remember that ALF can have different partitions that evolve in different ways, but I don't know off the top of my head how starting sequences are are assigned to them. If you can't work it out from the manual/config file let me know and I'll look into it.

Morteza-M-Saber commented 6 years ago

The starting sequence you used '450_S_pneumo_genes.db' is not included in the folder your uploaded in Sanger FTP. It would be useful if you could also send it. Thank you in advance. ALF can have different partitions of sequences that evolve in different ways. In the tutorial it explain how to assign different proportions of randomly generated sequences to evolve differently but it mention nothing how to assign real sequences to have different evolutionary rates. I am not sure whether ALF do have that option or not but If you happen to know the solution I would appreciate if you could share it.

johnlees commented 6 years ago

I've attached all the .db files I have here, which includes the reduced size and full size pneumo genes db files here: db_files.tar.gz

I think this section of the config file allows you to assign the starting genes to each class

###
## model selection
# enumerate combinations of substitution/indel/rate variation models that define
# your different types of sequences
seqTypes := [[1,1,1,'type1'], [2,1,1,'type2']]:

# supply an array of frequencies of types 1..n defined above for random assignment
# supply an array of of length protStart with type assignments for each initial gene
#seqTypeAssignments := [1]:
seqTypeAssignments := [0.75, 0.25]:
#seqTypeAssignments := [0.25, 0.5, 0.25]:

If you had 450 starting genes, you could set seqTypeAssignments := ['type1', 'type1', 'type2', ..., type1] so that each gene corresponds to an entry in that array, with the entries the first element of seqType

Morteza-M-Saber commented 6 years ago

I see. Thank you very much for the db files and information.