jkbonfield / io_lib

Staden Package "io_lib" (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Other
36 stars 15 forks source link

converting scf to ab1/abi #54

Open PedroDaPos opened 10 months ago

PedroDaPos commented 10 months ago

Hi! I am using Staden/io_lib via a conda installation to convert scf files to ab1/abi (the context here is that there is a R package I am using for trace analysis, for a reproducible example, but it only accepts ab1 files). I am running the following command:

convert_trace scf ab1 < input_file.scf > output_file.ab1

Everything seems to work fine, but when I try to use this file neither the R package or even Geneious recognizes this file as ab1 format. I was wondering if you would be able to give me some advice on what I might be doing wrong. Thank you in advance for your help!

jkbonfield commented 10 months ago

Hello,

Sorry for the slow reply.

On Fri, Dec 15, 2023 at 12:39:45PM -0800, PedroDaPos wrote:

Hi! I am using Staden/io_lib via a conda installation to convert scf files to ab1/abi (the context here is that there is a R package I am using for trace analysis, for a reproducible example, but it only accepts ab1 files). I am running the following command:

convert_trace scf ab1 < input_file.scf > output_file.ab1

Everything seems to work fine, but when I try to use this file neither the R package or even Geneious recognizes this file as ab1 format. I was wondering if you would be able to give me some advice on what I might be doing wrong. Thank you in advance for your help!

It's been decades since I worked on that code! Everything involving AB1 was reverse engineered by various groups because ABI wanted to keep it top secret and lock people into their format. While the basic structure is well known, it's a tagged key-value pair format and there are a lot of extra ancillary fields which may or may not be needed by various tools.

Io_lib never implemented AB1 writing. Only reading. This was deliberate as we didn't want to promote secrecy and use of proprietary file formats. It's a bug though that the tool didn't give an error on usage. No doubt it wrote out ZTR instead. Try doing "less output_file.ab1" and you'll see the header bytes.

I think you'd be better off speaking to the authors of the R package and requesting that they support one of the public file formats instead. (I see Geneious supports SCF)

James

-- James Bonfield @.***) The Sanger Institute, Hinxton, Cambs, CB10 1SA

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.

PedroDaPos commented 10 months ago

Thank you for clarifying this, I really appreciated it! I reached out to the R package authors. Have a nice new year!