statgen / Minimac4

GNU General Public License v3.0
54 stars 17 forks source link

Help to upgrade test suite from version 4.0.3 to 4.1.1 #54

Closed tillea closed 1 year ago

tillea commented 1 year ago

Hi, the Debian package of minimac4 contains a CI test which is running

minimac4 \
         --refHaps refPanel.m3vcf.gz \
         --haps targetStudy.vcf.gz \
         --prefix testRun

with some data obtained from an old minimac3 archive. This works with version 4.0.3. I now want to upgrade the test to work with version 4.1.1 to prepare the latest version of minimac4 for the next Debian release. When I run this command I get:

Warning: --refHaps is deprecated
Warning: --haps is deprecated
Warning: --prefix is deprecated in favor of --output, --empirical-output, and --sites
minimac v4.1.0

Error: could not load reference file index (reference must be an indexed MVCF)
Notice: M3VCF files must be updated to an MVCF encoded file. This can be done by running `minimac4 --update-m3vcf input.m3vcf.gz > output.msav`
Error: could not stat reference file

Thus I tried running

$ minimac4 --update-m3vcf refPanel.m3vcf.gz > refPanel.msav
minimac v4.1.0
$ minimac4 \
         --refHaps refPanel.msav \
         --haps targetStudy.vcf.gz \
         --output testRun
Warning: --refHaps is deprecated
Warning: --haps is deprecated
minimac v4.1.0

Imputing 6:1-892068 ...
Loading target haplotypes ...
Error: cannot query region (6:1-3892068) from target file. Target file must be indexed.
Error: failed loading target haplotypes

What I need is a sensible and working test for the new version - either by fixing the existing test or crafting a new one with your help. Could you suggest something I can run as kind of a test suite? (Please note: I'm not a minimac4 user not even someone with a background in bioinformatics - I'm just a Debian developer supporting bioinformaticians. Thus I'm lagging the background of all these file formats and what test might make sense.)

BTW, the executable is reporting the wrong version number. I compiled version 4.1.1 but it is claiming 4.1.0.

Thanks a lot in advance for any help, Andreas.

jonathonl commented 1 year ago

Version v4.1 introduced support for random access and now the target VCF file (targetStudy.vcf.gz) needs to be indexed (i.e., it needs a corresponding .csi or .tbi) file. This can be generated with bcftools index targetStudy.vcf.gz.

As for the version, is it possible that you had a merge conflict in CMakeLists.txt? Running grep project CMakeLists.txt will yield the version used in the executable.

jonathonl commented 1 year ago

Where did you get targetStudy.vcf.gz from? I can index it for you if you don't have bcftools readily available.

jonathonl commented 1 year ago

Also, the diff commands (https://salsa.debian.org/med-team/minimac4/-/blob/master/debian/tests/run-sample-analysis#L30-31) using the existing expected files will likely fail. The record annotations (INFO fields) have been updated and optimizations in the statistical model could produce slightly different results than v4.0. Another difference is that the {prefix}.info file is now {prefix}.sites.vcf.gz.

tillea commented 1 year ago

Hi Jonathon,

Am Wed, Jan 18, 2023 at 01:15:01PM -0800 schrieb Jonathon LeFaive:

Also, the diff commands (https://salsa.debian.org/med-team/minimac4/-/blob/master/debian/tests/run-sample-analysis#L30-31) using the existing expected files will likely fail. The record annotations (INFO fields) have been updated and optimizations in the statistical model could produce slightly different results than v4.0. Another difference is that the {prefix}.info file is now {prefix}.sites.vcf.gz. Thanks a lot for your quick responses. There is no real point to stick to the old test. If you could provide me with a sensible data set and a command line to run with some expected result that's perfectly fine. Kind regards, Andreas.

tillea commented 1 year ago

Am Wed, Jan 18, 2023 at 12:39:23PM -0800 schrieb Jonathon LeFaive:

As for the version, is it possible that you had a merge conflict in CMakeLists.txt? Running grep project CMakeLists.txt will yield the version used in the executable. I confirm that I have

CMakeLists.txt:project(minimac4 VERSION 4.1.1)

and I wonder why the output is 4.1.0. I'll check this later.

jonathonl commented 1 year ago

I have incorporated a test into the repo (https://github.com/statgen/Minimac4/commit/822c4a24143164493e5b1e9e9ee61f37e6c39d26). The deb package bcftools is required to run the test.

cmake -DCMAKE_TOOLCHAIN_FILE=../cget/cget/cget.cmake -DBUILD_TESTS=ON ..
make
make CTEST_OUTPUT_ON_FAILURE=1 test

If needed, I can bump version with test target to 4.1.2.

tillea commented 1 year ago

Am Thu, Jan 19, 2023 at 02:06:22PM -0800 schrieb Jonathon LeFaive:

I have incorporated a test into the repo (https://github.com/statgen/Minimac4/commit/822c4a24143164493e5b1e9e9ee61f37e6c39d26). The deb package bcftools is required to run the test. Very cool.

If needed, I can bump version with test target to 4.1.2. This would be actually helpful.

Thanks a lot, Andreas.

jonathonl commented 1 year ago

v4.1.2 release has been created.