Open subwaystation opened 4 years ago
So I could only get a FASTA output for each alignment:
from Bio import AlignIO
for multiple_alignment in AlignIO.parse("public_sequence_resource_22042020.maf", "maf"):
print("printing a new multiple alignment")
for seqrec in multiple_alignment:
# print("starts at %s on the %s strand of a sequence %s in length, and runs for %s bp" % \
# (seqrec.annotations["start"],
# seqrec.annotations["strand"],
# seqrec.annotations["srcSize"],
# seqrec.annotations["size"]))
splitty = seqrec.id.split("/")
splittty = splitty[3].split(":")
AlignIO.write(multiple_alignment,
"%s.fa" % splittty[1],
"fasta")
Which is painfully slow of course......
from Bio import AlignIO
abc = AlignIO.parse("public_sequence_resource_22042020.maf", "maf")
AlignIO.write(abc,"public_sequence_resource_22042020.maf.fasta", "fasta")
Blindly does the job we want. I guess.
@ekg We should contact people who actually have to work with the data. You got anyone in mind?
Using
H VN:Z:1.0
S 1 CAAATAAG
S 2 A
S 3 GCT
S 4 T
S 5 C
S 6 TTG
S 7 A
S 8 G
S 9 AAATTTTCTGGA
S 10 GATTACA
S 11 ATATGCATC
S 12 A
S 13 TCA
S 14 CCAACTCTCTG
P x 1+,3+,5+,6+,8+,9+,10+,11+,13+,14+ *
P y 1+,5+,6+,8+,9-,10+,11+,13+,11+,14+ *
L 1 + 3 + 0M
L 3 + 5 + 0M
L 5 + 6 + 0M
L 6 + 8 + 0M
L 8 + 9 + 0M
L 8 + 9 - 0M
L 9 + 10 + 0M
L 9 - 10 + 0M
L 10 + 11 + 0M
L 11 + 13 + 0M
L 11 + 14 + 0M
L 13 + 14 + 0M
L 13 + 11 + 0M
and doing
maffer -g t.gfa -m > t.maf
msa_view t.maf
I have a similar problem
ERROR: bad integers or strand in MAF (strand must be + for reference sequence) --
"s y 13 12 - 64 AAATTTTCTGGA"
that suggests a malformed MAF output.
I tried the following:
Download the current public sequence resource GFA from https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca. Then I did:
I installed http://compgen.cshl.edu/phast/ which provides an Ubuntu binary. Then I ran into:
The last post in https://github.com/glennhickey/progressiveCactus/issues/67 might explain our problem.
Currently also taking a look at https://biopython.org/wiki/Multiple_Alignment_Format, but they do not state, if a conversion to FASTA is possible.