legumeinfo / web-components

A collection of Web Components for interacting with and visualizing biological data.
https://legumeinfo.github.io/web-components/
Apache License 2.0
5 stars 3 forks source link

pangene lookup can break by 414: Request-URI Too Long #399

Open adf-ncgr opened 1 week ago

adf-ncgr commented 1 week ago

A gene list that is under the max 100 items can nevertheless break the tool internally ("Failed to load data") when the number of characters of the gene list exceeds a certain size (<2000). Since the current implementation requires full yuck in each input gene, it means the effective limit is smaller than 100, which is probably fine for now but the mysterious errors received for gene lists in the gray zone are fairly unfriendly. Long term we are going to need to do something that allows us to have longer lists.

alancleary commented 1 week ago

Perhaps the solution here is to simply disable the querystring parameter functionality for this component. What do you think of this @adf-ncgr?

adf-ncgr commented 1 week ago

Yeah, I think we had talked about that once; I'm in favor.

adf-ncgr commented 4 days ago

after the code update to disable adding the long gene list to the query string this is the error I see in the console, which seems to indicate that there's a long URI in a GET to the intermine server:

{message: '414: Request-URI Too Long', locations: Array(1), path: Array(1), extensions: {…}}extensions: code: "INTERNAL_SERVER_ERROR"response: {url: 'https://mines.legumeinfo.org/legumemine/service/qu…alue%3D%27gnm4%27%2F%3E%3C%2Fquery%3E&format=json', status: 414, statusText: 'Request-URI Too Long', body: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">…limit for this server.<br />\n</p>\n</body></html>\n'}body: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>414 Request-URI Too Long</title>\n</head><body>\n<h1>Request-URI Too Long</h1>\n<p>The requested URL's length exceeds the capacity\nlimit for this server.<br />\n</p>\n</body></html>\n"status: 414statusText: "Request-URI Too Long"url: "https://mines.legumeinfo.org/legumemine/service/q[[Prototype]]: Object[[Prototype]]: Objectlocations: [{…}]message: "414: Request-URI Too Long"path: ['panGenePairs'][[Prototype]]: Object 0 [{…}]

The query string update may have solved an earlier error, but apparently we're not out of the woods. @alancleary let me know if you want me to send you the gene list that triggers this

alancleary commented 4 days ago

@adf-ncgr Sorry for the delay. Yes, please send me the gene list. This result seems to indicate that the original error wasn't even related to the web component querystring parameters. If this is the case, then do we actually want to remove the querystring functionality from the component (via PR #404)?

adf-ncgr commented 4 days ago

OK, just emailed it to you. I'm not %100 sure whether or not the original error was the same as this, but I think it's probably still a good idea to keep the gene list out of the query string. It looked like in the new version all query parameters were being kept out not just the long gene list, but that's probably fine. At this point, I'd say whatever is easiest for now and we can revisit later if needed.

alancleary commented 3 days ago

I was able to replicate the error using the gene list you sent. The stacktrace in the console clearly states that the error is coming from Intermine. This can be confirmed by searching the Intermine repo on GitHub for the error message which reveals a comment in the code explaining that Intermine has a length limit on the URLs it serves. So the only way I see around this is to further reduce the number of genes the lookup component accepts or sending requests to Intermine via POSTs. @adf-ncgr do you know if the latter is possible? I naively tried to convert our existing GET requests to POSTs but Intermine didn't like it.

adf-ncgr commented 3 days ago

I would have thought POSTing was an option, but I don't know for sure. That seems like the way to go if possible. I will look into it, but if we can't make that work in short order I guess we'll have to just reduce the number of genes as you suggest.

alancleary commented 3 days ago

Eavesdropping on the Intermine web interface reveals that imjs is in fact using POST requests. For some reason, though, I can't get the request structured correctly using the Apollo RESTDataSource API. This may be due to the fact that the body of the request sent by imjs is encoded as a URL querystring parameters string, rather than serialized JSON...

adf-ncgr commented 3 days ago

Thanks for looking into it; if I understand what you're describing, it sounds like it should be solvable one way or another though perhaps not by using official RESTDataSource API means? Alternatively, we could try to tweak it on the intermine end, though that may be more involved. I've pretty much been in meetings all day but should have some time soon to look at it more closely, then maybe tomorrow we can discuss further.

alancleary commented 3 days ago

So apparently there's some headers that have to be included with the POST request otherwise Intermine will try and parse it like a GET. Also, the way the body of the post needs to be encoded as a string is very cryptic. I can successfully make a request using the RESTDataSource API if I hard-code the body from a post request sent via the Intermine web interface but if I attempt to programmatically encode the same string via the GraphQL server then it breaks. Egad, this is annoying!

alancleary commented 3 days ago

It's working! I can successfully submit the complete gene list you sent me. I'll follow up via email regarding next steps.

adf-ncgr commented 3 days ago

AWESOME! glad you got it to work despite the need to do some olde-time cussing (are the kids really saying "egad" these days?) look forward to your next steps email