opencb / biodata

Java library that models biological entities and their equivalents in different file formats typically used in bioinformatics
Apache License 2.0
29 stars 34 forks source link

Store VCF ID value as a FileEntry attribute #176

Open j-coll opened 4 years ago

j-coll commented 4 years ago

In the current implementation, we are losing the track between an ID (column ID from the VCF) and its file. This is not a problem when the ID has a rs , but in case of containing other internal Ids (see https://github.com/Illumina/manta), it needs to be related to the original file.

This ID should be added to the FileEntry attribute, together with the FILTER and QUAL, using another reserved key ID

chr1  rs1234  1000  A   C   100  PASS   DP=30
{
  id: "chr1:1000:A:C",
  names: ["rs1234"],
  studies: [{
    files : [{
      attributes: {
        "ID": "rs1234",
        "QUAL": "100",
        "FILTER": "PASS",
        "DP": "30"
      }
    }]
  }]
}
imedina commented 4 years ago

Nice tickt, just one question: variant.names will always contain the union of all VCF IDs found?