zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

seqMerge error #91

Closed davidroberson closed 2 months ago

davidroberson commented 3 months ago

I split a large bcf (bcftools view) and converted each shard to a gds. I am now trying to merge the shards back together but not all of the smaller gds will merge. Occasionally I get this error

Error in .append_node_variant(gfile, "position", "int32", storage.option, : Invalid number of variants in 'position'. Calls: seqMerge -> .append_node_variant

I have been careful to not split near variant positions. Any ideas? Thanks.

davidroberson commented 3 months ago

If the original file had multi-alleles on separate records would this cause the error shown above?

davidroberson commented 3 months ago

just a kind ping..please help :)

zhengxwen commented 3 months ago

Not sure what the issue is. Did you split the bcf file by variants or samples? If the variants are split, then seqMerge() could just simply concatenate the data.

Also, show me the session information: sessionInfo().

zhengxwen commented 2 months ago

SeqArray_1.44.1 (https://github.com/zhengxwen/SeqArray/commit/702488b6b98156998e07b34b49cfb3585d14e370) should fix this question.

davidroberson commented 2 months ago

thank you. I will confirm today.

davidroberson commented 2 months ago

Thank you @zhengxwen I can confirm that this fix worked. It allowed me to create a split, apply, combine (seqMerge) workflow to speed up a conversion pipeline we have.