plantinformatics / pretzel

Javascript full-stack framework for Big Data visualisation and analysis
GNU General Public License v3.0
43 stars 13 forks source link

Layout and finalisation of Genotype Search #392

Open Don-Isdale opened 1 week ago

Don-Isdale commented 1 week ago

Part of #383

Purpose

A key goal of the Pretzel application is to enable researchers who are not specialised in bioinformatics to access the Genotype database. This means that focus is placed on usability, ensuring that the workflow is simple and it is clear how to use the application and proceed through the workflow / Use Case. The data display and visualisation is easy to read and interpret. The application provides extensive checking of user inputs and reporting on errors to the GUI. After completing the core functionality of the #383 Working Group # 3 Use Case, this issue focuses on "polishing" the GUI and usability of the added features.


Observable outcomes / Acceptance criteria :

User work-flow of the VCF Genotype Search panel satisfies the specification in regard to user ergonomics, and the display layout presents the required data outputs in a readable format.

Measured with : Screen-shot showing Genotype Table presentation of output data.



This is completion of work done for #391

Don-Isdale commented 1 week ago

Test-Readiness testing

General testing of 'VCF Genotype Search' panel to ascertain when it is ready for Integration Test described in #383.

Test 'VCF Genotype Search' panel using these inputs, including extra newlines.

Test Sample Names :

ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317

ExomeCapture-DAS5-002978 ExomeCapture-DAS5-003024 ExomeCapture-DAS5-003024 ExomeCapture-DAS5-003047

Test SNP Names :

scaffold38755_1207866 scaffold38755_1235130 scaffold89939_1420884 scaffold89939_1421208

scaffold38755_1207866 scaffold38755_1235130

Produces this trace in the server log :

+ bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-002978,ExomeCapture-DAS5-003024,ExomeCapture-DAS5-003024,ExomeCapture-DAS5-003047 -H -f '%CHROM  %ID %POS    %REF    %ALT    %INFO[  %GT]
' -i ' ID="scaffold38755_1207866" || ID="scaffold38755_1235130" || ID="scaffold89939_1420884" || ID="scaffold89939_1421208" || ID="scaffold38755_1207866" || ID="scaffold38755_1235130" '

cbWrap Error: Error: sample #7 not found in the header, user --force-samples to proceed anyway

Proceeding to investigate the cause using bcftools command line :

Test Sample Names :

ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-002978 ExomeCapture-DAS5-003047

bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-002978,ExomeCapture-DAS5-003047 -H -f '%CHROM  %ID %POS    %REF    %ALT    %INFO[  %GT]
' -i ' ID="scaffold38755_1207866" || ID="scaffold38755_1235130" || ID="scaffold89939_1420884" || ID="scaffold89939_1421208" || ID="scaffold38755_1207866" || ID="scaffold38755_1235130" '
> Error: sample #7 not found in the header, user --force-samples to proceed anyway

removed sample # 7 :

ExomeCapture-DAS5-002978

Test Sample Names :

ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-003047

bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-003047 -H -f '%CHROM   %ID %POS    %REF    %ALT    %INFO[  %GT]
' -i ' ID="scaffold38755_1207866" || ID="scaffold38755_1235130" || ID="scaffold89939_1420884" || ID="scaffold89939_1421208" || ID="scaffold38755_1207866" || ID="scaffold38755_1235130" '
> free(): invalid next size (fast)
Aborted (core dumped)

Test Sample Names :

ExomeCapture-DAS5-001803 ExomeCapture-DAS5-001365 ExomeCapture-DAS5-002317 ExomeCapture-DAS5-003047

bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317,ExomeCapture-DAS5-003047  -H -f '%CHROM %ID %POS    %REF    %ALT    %INFO[  %GT]\n' -i ' ID="scaffold38755_1207866"  || ID="scaffold38755_1235130" || ID="scaffold89939_1420884" || ID="scaffold89939_1421208" || ID="scaffold38755_1207866" || ID="scaffold38755_1235130" '
#[1]CHROM   [2]ID   [3]POS  [4]REF  [5]ALT  [6](null)   [7]ExomeCapture-DAS5-001803:GT  [8]ExomeCapture-DAS5-001365:GT  [9]ExomeCapture-DAS5-002317:GT  [10]ExomeCapture-DAS5-003047:GT
1A  scaffold38755_1207866   1207866 C   T   F_MISSING=0.0224525;NS=566;AN=1132;MAF=0.152827;AC=173;AC_Het=9 0/0 0/0 0/0 0/0
1A  scaffold38755_1235130   1235130 C   T   F_MISSING=0.0259067;NS=564;AN=1128;MAF=0.150709;AC=170;AC_Het=12    0/0 0/0 0/0 0/0
1B  scaffold89939_1420884   1420884 A   G   F_MISSING=0.0120898;NS=572;AN=1144;MAF=0.0550699;AC=63;AC_Het=9 0/0 0/0 0/0 1/1

Conclusion :

Don-Isdale commented 5 days ago

Test-Readiness testing

General testing of 'VCF Genotype Search' panel to ascertain when it is ready for Integration Test described in #383.

Test with other databases on Test Server

Found that where the dataset has no ID, the name is displayed in the Genotype Table as : Chr1A_295560097_T_C Tried : SNP Names search input 295560097 - doesn't work because there is no ID value.

Conclusion

Found that if dataset does not have ID value it may be '.', which the Genotype Table will display as chr_position_ref_alt It would be possible to search on POS instead of ID in this case - consider that an extension at this point. No action required as yet, but this case could be confusing to a user - may need to clarify that the dataset does not have ID values, or add this check to the VCF status report, so the data administrator can ensure the dataset is ready before making it public.


Genotype Table layout

Test on the test server using one of the VCF Genotype datasets, and a small number of samples and SNP names which are present in that dataset.

Test Observations :

Columns in the Genotype Table for the input sample names are not displayed, although those names are used in the API request and appear in the API response.

The Genotype Table is displayed on the right as required, but the default width of the Genotype Table should be increased to allow for more sample columns to be viewed without scrolling.

Conclusion

Don-Isdale commented 3 days ago

Test-Readiness testing - continued

Test with more datasets on Test Server

Testing with a variety of datasets on the test server : revealed issues with

Conclusion

No current action planned for those 2 issues.

It may be worth replicating and investigating the 502 errors which occurred for large datasets - the bcftools commands are separate processes so the node.js server should still be responsive.

Don-Isdale commented 2 days ago

Test-Readiness testing - continued

Test on test server

Testing showed that the genotype values were requested and received correctly, but the requested Sample columns were not displayed in the Genotype Table.

Solved by :

Testing with some larger datasets which used soft-links to the genome repository volume. This showed a need to change the criteria which identified the .vcf.gz files to search in genotype-search. Solved by :

Test Consecutive Searches

Test : after completing a 'VCF Genotype Search', an additional search was performed on a different dataset.

Result : the results for the 2nd search were displayed. The SNPs of the 1st search remained displayed, but the Samples were not selected in that dataset, and hence not displayed in the Genotype Table. After re-selecting the samples for the first dataset, the samples and genotype values are displayed in the Genotype Table. The following screenshot shows the result after the 2nd search, and after re-selecting the samples for the first dataset (these are screenshots of different searches - the point this is demonstrating is that the samples and genotype values of the first search are not displayed until they are re-selected).

Screenshot from 2024-06-26 16-54-18

Screenshot from 2024-06-26 16-47-42

Conclusion

Ensure that the requested samples for the dataset of the first search remain selected after a consecutive search.

Don-Isdale commented 2 days ago

Test case

Testing with these Feature names :

AVRIG14048
Rht-B1
AVRIG39540

The results were displayed, but not for AVRIG14048. Searching for just that SNP didn't work in one test, and did work in another.

Conclusion