suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 50 forks source link

Accept Structural Variant VCF #39

Closed DarioS closed 4 years ago

DarioS commented 4 years ago

Most structural variant callers output results in VCF. To simplify the output of one program as the input to another, it would be desirable to specify a VCF using -d.

suhrig commented 4 years ago

Good point. Originally, I tried to keep it simple for the end-user and decided to accept a four-column text file, because the output format of SV callers was not well standardized in the past. Each tool had a different format, so the most flexible yet user-friendly solution was to accept a format with the minimum information. But now that many SV callers support VCF format, accepting this format has become the more user-friendly solution. I am not 100% sure yet how to implement this, while keeping Arriba backward compatible. I will think about it. Probably, I will make Arriba auto-detect the format. Hopefully, this feature will be available with the next release. I will keep this issue open until it's implemented.

suhrig commented 4 years ago

I just pushed an enhanced version of Arriba to the develop branch, which accepts structural variants in VCF format and the old four-column format. The format is auto-detected. If you need this feature urgently, you can obtain the new version using these commands:

git clone https://github.com/suhrig/arriba.git
cd arriba
git checkout develop
make

Please beware that this is a development version of Arriba. Although it should run fine in practice, it has not been tested as thoroughly as official releases. Also, there are definitely going to be additional changes to the output format (new columns added, some rearranged) in the next official release, so please be flexible with regards to changes. Alternatively, you can just wait for the next official release.

DarioS commented 4 years ago

I will wait for the release version. I am having problems with my DNA structural variant caller and can't test Arriba. I am using GRIDSS, which outputs single breakends (other end might be a virus or retrotransposon) as well as the more typical breakpoints (both breakends are in the human genome) and all variants have SVTYPE set to BND. Hopefully, this won't cause errors when I can try filtering.

suhrig commented 4 years ago

All standard SVTYPEs are supported. I have not tested single break ends, though. Thanks for bringing this to my attention. Probably, Arriba currently issues a warning and ignores the line, which would be fine.

suhrig commented 4 years ago

Hi Dario, version 2.0.0 is finally out, which comes with the ability to parse structural variant VCF files. Single break ends are not supported, however, because Arriba does not consider those anyway. Regards, Sebastian