maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
102 stars 40 forks source link

Cannot find dataset when refreshing #257

Open samfenske opened 1 year ago

samfenske commented 1 year ago

Hello, I have been able to upload a handful of datasets to a project, and often times I will have to clear the cache for the new dataset to appear. However, I noticed that when I clear the cache with a dataset open or try to open a specific link referencing the dataset (ex. the url for the cellbrowser object with a certain gene selected), I get the following error.

"Could not find a dataset at 01integrated_bal_v5/dataset.json?19d85d6751. If you are sure that the link is correct, please contact the administrator of this server, or cells@ucsc.edu if this is running at UCSC."

Looking through the project's index.html file, I thought the md5 code may have something to do with this as the 10-digit error code (19d85d6751) did not match the 10-digit md5 code. I tried changing it and reloading the page, but received the same error. Am I on the right track here and just haven't synchronized the md5 code, or is there perhaps another way I can get cellbrowser to recognize the dataset? I should clarify that the objects work perfectly fine when I open them, I just get this error when I clear the cache or open a specific link for the object.

Thanks!

maximilianh commented 1 year ago

Thank you for this observation, we've had someone else complain about this, and it's not intended to be like this, but a bug that has to do with how the MD5 checksums are calculated. Can you give me a few more details?

having an incorrect MD5 in the URL should not matter, should not trigger an error, as they're after the "?" character, so not part of the URL. The MD5 is there only to get around the caching.

I think your problem is best described like this, let me know if this is not correct:

1) you add a dataset to a collection 2) you go to the URL of the collection, but the dataset is not there 3) you have to reload the page with shift or in another browser to see it.

Is this correct? If so, then somewhere where I calculate the MD5 for the collection, there must be a problem.

samfenske commented 1 year ago

Maximilian, thank you for getting back to me! I can provide more specifics. When I go to the collection URL, the dataset is there, and I can refresh/clear cache, and the dataset will still be there. I can go ahead and open the dataset, but if I now try to refresh/clear cache I get the error. It seems the link is invalid- with a dataset opened and working, if I copy and paste the URL into another tab I get the error.

maximilianh commented 1 year ago

Very odd. This is different from what I thought. Do you have an example of a failing URL?

On Mon, Nov 14, 2022 at 4:51 PM Sam Fenske @.***> wrote:

Maximilian, thank you for getting back to me! I can provide more specifics. When I go to the collection URL, the dataset is there, and I can refresh/clear cache, and the dataset will still be there. I can go ahead and open the dataset, but if I now try to refresh/clear cache I get the error. It seems the link is invalid- with a dataset opened and working, if I copy and paste the URL into another tab I get the error.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/257#issuecomment-1313968627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKIQ35TORLEUEROSM3WIJNWVANCNFSM6AAAAAAR3S45VQ . You are receiving this because you commented.Message ID: @.***>

samfenske commented 1 year ago

The collection is internal so you can't access it without certain VPN, but the URL is structured like this: ...northwestern.edu/long-covid/?ds=01integrated_BAL_v5. Some of our public datasets are available at https://nupulmonary.org/resources/, which are structured the same way.

maximilianh commented 1 year ago

I don't understand how this can happen... northwestern.edu/long-covid/?ds=01integrated_BAL_v5 doesn't have any MD5 in it. When you open the console, the error'ing URLs have the ?md5=xxxx in them, but that's OK. The files should still load. the part after the "?" is ignored after the URLs...

I wonder if this could have something to do with your webserver... does this also happen on nupulmonary.org ? Do you know which web server you're using?

On Mon, Nov 14, 2022 at 5:26 PM Sam Fenske @.***> wrote:

The collection is internal so you can't access it without certain VPN, but the URL is structured like this: ... northwestern.edu/long-covid/?ds=01integrated_BAL_v5. Some of our public datasets are available at https://nupulmonary.org/resources/, which are structured the same way.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/257#issuecomment-1314028846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TJG5LU5UYM5XNYMPELWIJR4RANCNFSM6AAAAAAR3S45VQ . You are receiving this because you commented.Message ID: @.***>

samfenske commented 1 year ago

I only get the md5 code in the error message, it never actually presents itself in the URL. I haven't seen this happen on nupulmonary. I have gotten this error before on our internal browser, but it was a brand new project so I just made a new collection and it worked fine. This collection was functioning fine until last week something must've happened when I was removing datasets. I would remove the data folder and remove the dataset from the dataset.json file, which I've done before and it hasn't caused any issues. The web server we use is apache httpd 2.4.6.

maximilianh commented 1 year ago

I am sorry but if this happens again, I can't say more. Can you send me a concrete list steps such that I can reproduce this? We don't have the problem here, it's definitely not related to your webserver, but if I can't see the error in action and can't open the Javascript console myself, I can't debug it. If this happens again, you could copy the entire folder to your public webserver to a hidden location and let me debug it, I can then say more. Or tell me a list of steps to reproduce it.

If you remove a dataset from a collection, you need to remove the dataset from the data directory and also on the webserver htdocs directory and also run cbBuild in the collection directory so that all the links are fixed and the dataset is actually removed from the collection. I imagine you know this. Still, there must be some problem with the caching somewhere if this didn't fix it...

On Mon, Nov 14, 2022 at 9:38 PM Sam Fenske @.***> wrote:

I only get the md5 code in the error message, it never actually presents itself in the URL. I haven't seen this happen on nupulmonary. I have gotten this error before on our internal browser, but it was a brand new project so I just made a new collection and it worked fine. This collection was functioning fine until last week something must've happened when I was removing datasets. I would remove the data folder and remove the dataset from the dataset.json file, which I've done before and it hasn't caused any issues. The web server we use is apache httpd 2.4.6.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/257#issuecomment-1314350176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TOSWJZKFTTQSWAVPNLWIKPK5ANCNFSM6AAAAAAR3S45VQ . You are receiving this because you commented.Message ID: @.***>