Closed olekto closed 2 years ago
Hi Ole,
happy to hear that you like the idea of the package! And looking more at chromosome-scale rather than genes is definitely an interesting use case. I'd be curious about your experience going forward because this is something that people don't do that often and so I get less feedback on this kind of use than just on plotting genes.
Try use read_seqs("groups6.fasta")
to read fasta files.
read_fai()
is a low-level parser for fasta index files. Those are tab-separated files generated by samtools faidx/seqkit faidx
.
The size of the sequences/fasta file does not matter. gggenomes essentially ignores the sequences themselves, just reads the headers and counts the length of the sequences.
Hi Thomas, that worked nicely.
I had generated fai files also, so it looked like it parsed those. Not sure what happened there.
Ole
read_fai()
just reads text files line by line and expects a seq_id in the first col and a seq length in the second. That also works on a plain fasta file, though makes no sense. The error message you got was misleading. If you read a fasta file directly with read_fai
, it just reads one column instead of the expected two.
You can use read_seqs
also on different files including .fai index and .fasta files. It will internally call the right parser, i.e. read_fai
for .fai, read_seq_len()
for fasta, gff or genbank. If you already have .fai files, reading those is a lot faster than reading .fasta files.
Hi, I'm really excited about this package. Finally a customisable tool that can draw nice synteny figures!
However, I jumped right into my use-case, and started with trying to draw three homolog chromosomes from some fishes. When reading in the sequences (
group6_seq = read_fai("group6.fasta")
), I get this warning:When trying to plot this, I get this:
And then I get this error:
My R-fu isn't that strong, so maybe I messed up somewhere. However, I was wondering what limitations there are regarding sizes of the sequences that I want to read in. These are about 35 Mbp each, totalling 100 Mbp. I don't really want to look at genes and such in the first round, but more chromosome structure (inversions, centromeres, etc). Is there any reason why this shouldn't work?
Thank you.
Sincerely, Ole