Adding installation guide.

cmungall commented 1 month ago

Fixes #114

ielis commented 1 month ago

@pnrobinson the CI action is failing because the notebook uses a relative path to a HPO JSON instead of the OntologyStore API of HPO toolkit (see here).

Please use the OntologyStore everytime you work with HPO. It would probably be better to have pyphetools.creation.TemplateImporter take hpotk.MinimalOntology instead of a str.

cmungall commented 1 month ago

Thanks! No strong opinion on the dot (I am a poetry-head myself 😄 )

And yes I was wondering about the failing action

pnrobinson commented 1 month ago

Adjust

hp_json = "../hp.json"
created_by="ORCID:0000-0002-0736-9199"
timporter = TemplateImporter(template=template, hp_json=hp_json, created_by=created_by)

in example file so that we do not need the local hp.json

pnrobinson commented 1 month ago

I updated the templates in phenopacket store. @ielis One issue is that the templates reach out to the VariantValidator API for many variants. Sometimes there is a network issue and one needs to simply rerun the cell. This can cause the CI to fail even though the code is correct. Not sure if we want to keep this in the default CI or make this action optional.

pnrobinson commented 1 month ago

@ielis It is probably better to put the CI tests in this repo -- for phenopacket-store, after we have generated the phenopackets, there is no need to rerun the scripts unless we notice a mistake. Also, in this repository we can check in pickled versions of the variant-validator response, which will avoid network issues. Was there any reason to do this via phenopacket-store or would this also be OK?

ielis commented 1 month ago

@ielis One issue is that the templates reach out to the VariantValidator API for many variants. Sometimes there is a network issue and one needs to simply rerun the cell.

Yeah, this is a pain and in general we should avoid depending on network for the tests to pass. In other projects, we mark the network-dependent tests with a special annotation @pytest.mark.online.

For instance:

@pytest.mark.online
@pytest.mark.parametrize(
    'tx_id, contig, start, end, strand',
    [
        # ANKRD11
        ('NM_013275.6', '16', 847_784, 1_070_716, Strand.NEGATIVE),
        # MAPK8IP3
        ('NM_001318852.2', '16', 1_706_194, 1_770_351, Strand.POSITIVE),
    ]
)
def test_fetch__end_to_end(
        self,
        coordinate_service: VVTranscriptCoordinateService,
        tx_id: str,
        contig: str, start: int, end: int, strand: Strand,
):
    tx_coordinates = coordinate_service.fetch(tx_id)
    assert tx_coordinates.identifier == tx_id
    # And more assertions follow...

The tested coordinate_service pings Variant Validator's REST API and, therefore, is not run unless we invoke pytest with --runonline flag:

python --runonline

The --runonline flag includes the tests annotated with @pytest.mark.online in the test suite.

We can do something similar here to remove spurious failures, and we will only run the online tests now and then. Note, a special setup must be done in order to use --runonline. @pnrobinson let me know if you'd like me to do that here.

... Also, in this repository we can check in pickled versions of the variant-validator response, which will avoid network issues. ...

I would favor json instead of pickle because JSON files are easier to update when the VV format changes. I've done something similar elsewhere, so in principle it can be done. We should, however, decide if we want to use @pytest.mark.online annotation before addressing this point.

... It is probably better to put the CI tests in this repo -- for phenopacket-store, after we have generated the phenopackets, there is no need to rerun the scripts unless we notice a mistake.

I am OK with deleting the generate_phenopackets.yml action, in order to skip running the phenopacket-store notebooks as part of pyphetools' CI. @pnrobinson pls remove the action definition YAML or let me know - I can do it too.

pnrobinson commented 1 month ago

I think it is probably OK to run the tests locally. We do not need to worry about "old" notebooks or whatever, because we only re-run the phenopackets if there was a mistake in the data. I think it is probably best to remove the CI for now. We will need to rethink the entire pipeline soon, in order to rethink the HPO annotation pipeline, and I think a lot of stuff needs to be done, but we are testing the phenoapckets with ppt etc etc. @ielis

ielis commented 1 month ago

Then I will drop the notebook action and I will deal with the network-dependent tests in #117

monarch-initiative / pyphetools

Adding installation guide. #115