weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

Very chatty logs #304

Closed apoliakov closed 2 years ago

apoliakov commented 3 years ago

Hey guys!

When running SAIGE-Gene on about 400K samples we are getting very big log files. Mostly it's these lines: https://github.com/weizhouUMICH/SAIGE/blob/a3fc49cfb651bb19878a99457abbc1e2261443d5/src/SAIGE_readDosage_bgen.cpp#L536 https://github.com/weizhouUMICH/SAIGE/blob/a3fc49cfb651bb19878a99457abbc1e2261443d5/src/SAIGE_readDosage_bgen.cpp#L695

They seem to appear to happen a few times, over and over. Just running a handful of genes you can generate 1.4GB worth of log. Probably grows with the number of analyzed samples and high missingness. We save all the log files, so now if we do a whole set of genes, we have to worry about shuffling around gigabytes of log. Spitting out so much output probably hurts your performance too. It could make SAIGE-Gene slower - even if you don't save it to file.

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has substantial computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests and clearer log output. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei