ireceptor-plus / issues

0 stars 0 forks source link

Implement ADC API on ImmuneDB #18

Closed bcorrie closed 1 year ago

bcorrie commented 4 years ago

Need to have an ImmuneDB repository that implements the ADC API so we can query it from the iReceptor Gateway.

bcorrie commented 4 years ago

@areensh I am hoping that we will be able to use the AIRR Data Commons API to query an ImmuneDB repository (at least in alpha or beta) for the M24 deliverable. Would you be able to comment on where your implementation is at. I think in our last conversation with Uri he mentioned that it was basically working with a few bugs???

areensh commented 4 years ago

@bcorrie we are in the testing stage, we are running the ADC API suite tests, and also building our own tests. hopefully soon we will have a version of the implementation to be integrated into the immuneDB repository.

systemimmunologylab commented 4 years ago

One issue we would like some clarification about:

Do you have examples of test that involve queries to multiple samples across a repertoire? we could not find one. they all have one repertoire with one sample

If all our queries are at the single sample per repertoire level then translating things to the ADC API is really a one to one thing and even clones are easy to add. But when there are multiple samples even at the sequence level (Before counting clones) it matters if you are counting copies. unique sequence numbers across samples or unique sequences per sample (what we call instances).

once we get past this the next issue is queries that involve multiple repertoires. but I think that can be dealt with once we have a working ADC API.

U

On Oct 29, 2020, at 9:33 AM, areensh notifications@github.com wrote:

@bcorrie https://github.com/bcorrie we are in the testing stage, we are running the ADC API suite tests, and also building our own tests. hopefully soon we will have a version of the implementation to be integrated into the immuneDB repository.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/specifications/issues/55#issuecomment-718424452, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYBPP3HMBJQJZY7PCLOHDSNELELANCNFSM4TBCV2VQ.

systemimmunologylab commented 4 years ago

One issue we would like some clarification about:

Do you have examples of test that involve queries to multiple samples across a repertoire? we could not find one. they all have one repertoire with one sample

If all our queries are at the single sample per repertoire level then translating things to the ADC API is really a one to one thing and even clones are easy to add. But when there are multiple samples even at the sequence level (Before counting clones) it matters if you are counting copies. unique sequence numbers across samples or unique sequences per sample (what we call instances).

once we get past this the next issue is queries that involve multiple repertoires. but I think that can be dealt with once we have a working ADC API.

bcorrie commented 4 years ago

One issue we would like some clarification about: Do you have examples of test that involve queries to multiple samples across a repertoire? we could not find one. they all have one repertoire with one sample

Many of the queries will return several samples from a single repertoire as long as the data has such a structure.

For example, if you had a repertoire structure that had a single subject that had two samples, one with tissue 'blood' and one 'bone marrow', each with a pcr_target_locus of "IGH". If you also assume that both had "cell_number" < 10000, then the following query would return a repertoire with multiple samples.

https://github.com/airr-community/adc-api-tests/blob/master/repertoire/pass-and-op.json

All of the data in the IPA and AIRR COVID-19 repositories has a one sample per repertoire structure, so you would not find such data on these repositories. I know VDJServer has some data sets that have a more complex repertoire structure with multiple samples in a single repertoire... @schristley can probably point you at example repertoires on VDJServer that have this structure.

If all our queries are at the single sample per repertoire level then translating things to the ADC API is really a one to one thing and even clones are easy to add.

This is exactly what we have done, and that is certainly a valid approach to take. We did it for the same reason, it makes life much simpler.

But when there are multiple samples even at the sequence level (Before counting clones) it matters if you are counting copies. unique sequence numbers across samples or unique sequences per sample (what we call instances). once we get past this the next issue is queries that involve multiple repertoires. but I think that can be dealt with once we have a working ADC API. U

The ADC API Facets counts the number of lines in your rearrangement "table"/"collection. For each rearrangement you can have either a duplicate_count and a consensus_count (or neither).

        consensus_count:
            type: integer
            description: >
                Number of reads contributing to the (UMI) consensus for this sequence.
                For example, the sum of the number of reads for all UMIs that contribute to
                the query sequence.
        duplicate_count:
            type: integer
            description: >
                Copy number or number of duplicate observations for the query sequence.
                For example, the number of UMIs sharing an identical sequence or the number
                of identical observations of this sequence absent UMIs.

The Stats API provides mechanisms for counting rearrangements across these fields. Facets on the ADC API counts the number of records, with the repository deciding if they are storing "instances" (which I think would have a duplicate count if I understand correctly) or "raw" sequence annotations before taking into account duplicates and UMIs (with no duplicate_count).

bcorrie commented 4 years ago

@bcorrie we are in the testing stage, we are running the ADC API suite tests, and also building our own tests. hopefully soon we will have a version of the implementation to be integrated into the immuneDB repository.

@areensh let us know if you have a partial ADC API implementation that you want us to test by connecting it to the iReceptor Gateway. If you have a /repertoire endpoint that is returning some data, we can easily connect it to our test Gateway and see if it works 8-) In particular, if it is returning one sample per repertoire then it should be easy...

bcorrie commented 3 years ago

@areensh @systemimmunologylab would it be possible to test connecting an ImmuneDB repository using the ADC API some time soon. By testing, I mean testing an alpha implementation that we don't expect to be 100% complete. I find the best way to find bugs is to connect the API to something like the Gateway which sends a bunch of queries...

We want to be able to report on the level of integration for the M24 deliverable, and I need to start writing that report soon.

areensh commented 3 years ago

@bcorrie sorry for the late response i was unexpectedly OOF . actually we are still working on our testing which didn't progress much last week. i will do my best to make it ready this week so you will be able to connect by next week , hopefully this is not too late for your deadline? anyway could you explain how exactly are you planning to connect through the gateway and are there any requirements from our side?

bcorrie commented 3 years ago

Hi @areensh no worries...

If you are using the ADC API repository tests, there are a specific set of query tests that provide good coverage of the queries that the iReceptor Gateway sends to each repository. The iReceptor Gateway specific tests are here:

https://github.com/airr-community/adc-api-tests/tree/master/rearrangement/ireceptor

If your repository passes those tests, then it should fundamentally work with iReceptor Gateway.

We would then just need the IP/server name (no firewalls etc) and we would add it to the list of repositories the Gateway queries. For the purposes of this deliverable, we would just add it to our staging gateway (gateway-staging.ireceptor.org) and run a set of tests to confirm it is working. We would not yet put it into production on the main iReceptor Gateway yet.

areensh commented 3 years ago

@bcorrie ok then we need also to make sure you can have access to our server and through which ports... will update you by the end of the week

bcorrie commented 3 years ago

@areensh thought I would check in on where this is at... Have you been able to open the firewall and if so can you provide the IP number. I will then add it to the Staging Gateway and we can do an "alpha" integration test...

areensh commented 3 years ago

@bcorrie i will provide you with our server IP , i just want to make sure through which port you will be connecting to our server, because our server contains multiple databases for multiple studies , and not one database for multiple studies as your structure if i understand right.

bcorrie commented 3 years ago

@areensh if you tell us which port your http server is listening on, we can tell the iReceptor Gateway to use that port. So if your server has different studies in different databases on different ports then to test one study on one server in one database we would just need the IP and port for the database for the one study you want us to test...

bcorrie commented 2 years ago

@areensh @systemimmunologylab we haven't heard from you for some time about the status of ImmuneDB's implementation of the ADC API. I need to report on repository statuses - can you let me know where things are at?

bcorrie commented 2 years ago

@areensh @systemimmunologylab I just tried to test the Haifa ADC API and I am still unable to connect:

$ curl http://systemali.haifa.ac.il:8081/frontend/immunedb_paper/airr/v1/info curl: (7) Failed to connect to systemali.haifa.ac.il port 8081: Connection refused

Any insights?

areensh commented 2 years ago

Hi Brian,

Yes i am aware of that. The IT department are making some work with servers and the internal storages . I will update you once it’s done . I believe by end of tomorrow.

Best

Areen

On Tue, 26 Apr 2022 at 18:56, Brian Corrie @.***> wrote:

@areensh https://github.com/areensh @systemimmunologylab https://github.com/systemimmunologylab I just tried to test the Haifa ADC API and I am still unable to connect:

$ curl http://systemali.haifa.ac.il:8081/frontend/immunedb_paper/airr/v1/info curl: (7) Failed to connect to systemali.haifa.ac.il port 8081: Connection refused

Any insights?

— Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/issues/issues/18#issuecomment-1109969294, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDPIQA3ASRL73SPBIBOWI3VHAG2VANCNFSM4W5NFJFA . You are receiving this because you were mentioned.Message ID: @.***>

systemimmunologylab commented 2 years ago

it’s bad luck. We are installing some new hard drives in the server. had you tried to access it before and it worked? or was this your 1st attempt

Uri

On Apr 26, 2022, at 7:58 PM, areensh @.***> wrote:

Hi Brian,

Yes i am aware of that. The IT department are making some work with servers and the internal storages . I will update you once it’s done . I believe by end of tomorrow.

Best

Areen

On Tue, 26 Apr 2022 at 18:56, Brian Corrie @.***> wrote:

@areensh https://github.com/areensh @systemimmunologylab https://github.com/systemimmunologylab I just tried to test the Haifa ADC API and I am still unable to connect:

$ curl http://systemali.haifa.ac.il:8081/frontend/immunedb_paper/airr/v1/info curl: (7) Failed to connect to systemali.haifa.ac.il port 8081: Connection refused

Any insights?

— Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/issues/issues/18#issuecomment-1109969294, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDPIQA3ASRL73SPBIBOWI3VHAG2VANCNFSM4W5NFJFA . You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/issues/issues/18#issuecomment-1110035354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYBPPJOODFG244QZ4ELODVHAODVANCNFSM4W5NFJFA. You are receiving this because you were mentioned.

bcorrie commented 2 years ago

My attempts to date have not yet worked... There was something not working before the server update I believe

systemimmunologylab commented 2 years ago

did you try since our last discussion, before the day before yesterday? Areen can you make sure this is online ASAP and then contact brian directly, Brian, last time you tried to use the database was the lack of response similar to the one you got yesterday? (When the server was down for repairs) or was it something else?

best Uri

On Apr 27, 2022, at 5:28 PM, Brian Corrie @.***> wrote:

My attempts to date have not yet worked... There was something not working before the server update I believe

— Reply to this email directly, view it on GitHub https://github.com/ireceptor-plus/issues/issues/18#issuecomment-1111073703, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYBPKXACRFWG5CKY4BWMTVHFFJZANCNFSM4W5NFJFA. You are receiving this because you were mentioned.

bcorrie commented 2 years ago

All of my attempts have been the same error - the repository was not responding to connections at all (server down???).

bcorrie commented 1 year ago

Partial implementation reported on in Deliverable, project completed - closing this issue.