hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
966 stars 242 forks source link

export PLINK exports invalid results #4508

Closed tpoterba closed 5 years ago

tpoterba commented 5 years ago

ExportPLINK sorts the cols before exporting a .fam file

tpoterba commented 5 years ago

probably can unkey in python to fix this. also need to test.

tpoterba commented 5 years ago

also:

jigold commented 5 years ago

@tpoterba Do you have a specific example where this fails? I think the columns are already unkeyed before export with this line:

    dataset = dataset._select_all(col_exprs=fam_exprs,
                                  col_key=[],
                                  row_exprs=bim_exprs,
                                  entry_exprs=entry_exprs)

I tried making the Python test more robust where I permute the columns first so not in alphabetical order before exporting, but couldn't replicate the error.

The same is true for export_gen.

tpoterba commented 5 years ago

I could be wrong, I encountered the reordering cols warning during export_plink while looking at LD prune. will take a closer look today.

tpoterba commented 5 years ago

but if you're confident that it's working as intended, I am too!

jigold commented 5 years ago

I'm confident it's working.