haniffalab / webatlas-pipeline

A data pipeline built in Nextflow to process spatial and single-cell experiment data for visualisation in WebAtlas
MIT License
46 stars 10 forks source link

WebAtlas pipeline and WebAtlas App #108

Closed armaos closed 1 year ago

armaos commented 1 year ago

Hello and thank you for the great tool. I have set-ip the server of WebAtlas App in an AWS EC2 instance and i can access it though the PublicIP in the port 3000. http://<IP>:3000

I have also set the WebAtlas pipeline in the same AWS EC2 instance and i have run one of the examples (Visium : High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue) As mentioned in the documentation I serve the data with serve or npx. Since the data is in the same EC2 and the port 3000 is already occupied by the WebAtlas app, i am running successfully the pipeline server in another port: 3001. npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2 --port 3000 --cors I confirm I can access and see the content of the .conf file in the: http://<IP>:3001/visium-breast-cancer-config.json

My problem: I am trying to visualize the example though the Webatlas App I have set-up : http://<IP>:3000/?theme=dark&config=http://<IP>:3001/visium-breast-cancer-config.json But I cannot. Although it loads up the title (only): Visium CytAssist - High resolution mapping of the breast cancer tumor microenvironment I cant see anything else, it complains about CORS. Is there a way to enable it?

image

Similarly, when i access through webatlas.cog.sanger.ac.uk/: https://webatlas.cog.sanger.ac.uk/latest/index.html?config=http://localhost:3001/visium-breast-cancer-config.json

i cannot load it, this time with a different error: SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

image

However if I stop WebAtlas app and I just run the pipeline in port 3000, then i can successfully access through webatlas.cog.sanger.ac.uk/: https://webatlas.cog.sanger.ac.uk/latest/index.html?config=http://localhost:3000/visium-breast-cancer-config.json

However, the same complains if i want to access it by substituting the localhost with the real IP, But why???: https://webatlas.cog.sanger.ac.uk/latest/index.html?config=http://IP:3001/visium-breast-cancer-config.json with error

App.js:69 Mixed Content: The page at 'https://webatlas.cog.sanger.ac.uk/latest/index.html?config=http://44.204.174.73:3000/visium-breast-cancer-config.json' was loaded over HTTPS, but requested an insecure resource 'http://44.204.174.73:3000/visium-breast-cancer-config.json'. This request has been blocked; the content must be served over HTTPS.

I guess there is something implicit with the port 3000 that should stay. I am thinking as solution: set up 2 EC2 machines. One with the app and one with pipeline, both running on 3000 and access it through: http://<IP_1>:3000/?theme=dark&config=http://<IP_2>:3000/visium-breast-cancer-config.json I am not sure however that this will work. Maybe I should also put the pipeline behind https.

Any help? Thank you so much, A.

prete commented 1 year ago

Ah you're right, your will need to set CORS on your bucket if you're serving the app from a different domain than the data. You should be able to use something like this cors.json and apply it to your bucket.

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "HEAD",
            "GET"
        ],
        "AllowedOrigins": [
            "*"
        ]
    }
]
armaos commented 1 year ago

I see your point, However this solution, if I get it well, is for setting CORS in the AWS bucket that has the data. In my scenario, the app and the data are in a linux EC2 instance.

prete commented 1 year ago

Gotcha, what command are you using to serve those two? if it's then npx one, then can I check if you're using the option npx http-server .... --cors='*'?

armaos commented 1 year ago

for the webatlas-pipeline i am using: npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2 --port 3000 --cors as stated in the

But: i think the problem with CORS is not in the webatlas pipeline, but in the webatlas-app which runs with the npm start

I remind that i have webatlas-app and webatlas-pipeline set up locally in the same EC2 machine

davehorsfall commented 1 year ago

Hey @armaos

Thanks for the detailed description of your setup and problem.

You said you are running your WebAtlas App over port 3000. This is the default port when running npm start, so this makes sense.

You've also said you're serving the data output from your pipeline on port 3001 using the command: npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2 --port 3000 --cors

However, please be aware that the above command would serve the data using port 3000, not 3001. You'd need to modify the port number to 3001. I.e npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2 --port 3001 --cors

I'm not sure if that was just a type because you confirmed you could access the json file at http://<IP>:3001/visium-breast-cancer-config.json. However, please double check those commands and make sure you can see what you expect at both ports after you start both services. All of the error messages you've posted suggest that the data isn't being served over port 3001.

davehorsfall commented 1 year ago

@armaos, I forgot to mention - there is no requirement in the webatlas-app for CORS. It is only the web service that you're loading data from (i.e. the data output from the pipeline) that needs to allow CORS.

armaos commented 1 year ago

Hello Dave, Thank you so much for your answer! I appreciate your valuable time.

As for your comment, yes, i was running the pipeline correctly with: npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2 --port 3001 --cors (i had made a typo while writing the issue)

I have made multiple scenarios: In order not to mix them up i will just mention here the case where:

i Dont Run the Webatlas app , but instead i use the one in https://webatlas.cog.sanger.ac.uk/latest/index.html.

WORKING: 1a) If i use localhost for the data in webatlas-pipeline and i access them https://webatlas.cog.sanger.ac.uk/latest/index.html?theme=dark&config=http://127.0.0.1:3000/visium-breast-cancer-config.json this works great as it should. However, since this is in my localhost, i cannot share this link with other colleagues to see the same data like me. This is a bit of important imho. In order to share with them, I would have to use the actual IP of the machine that hosts the data of course. And this leads here:

PROBLEMATIC: 1b) If i serve npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2/ --port 3000 --cors and use the IP, then the good thing is that I can see the json and all files in here: http://IP:3000/visium-breast-cancer-config.json . Great. But if i try to access https://webatlas.cog.sanger.ac.uk/latest/index.html?theme=dark&config=http://IP:3000/visium-breast-cancer-config.json, it complains (console message):

The page at 'https://webatlas.cog.sanger.ac.uk/latest/index.html?theme=dark&config=http://IP:3000/visium-breast-cancer-config.json' was loaded over HTTPS, but requested an insecure resource 'http://IP:3000/visium-breast-cancer-config.json'. This request has been blocked; the content must be served over HTTPS.

So 1c) ok, i thought i will set up an SSL for http-server and serve with https instead : indeed, serving with npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2/ --port 3000 --cors -S -C cert.pem i can see the data etc in the https: https://IP:3000/visium-breast-cancer-config.json. Great, the https works..!

BUT when i try to go to https://webatlas.cog.sanger.ac.uk/latest/index.html?theme=dark&config=https://IP:3000/visium-breast-cancer-config.json the visualisation doesn't work: *Strangely , the webatlas app in sanger picks the _.json_ correclty from the IP:3000 but when it tries to load the rest of the files (zattrs, zmetadata etc), it looks for them in the localhost and not in the IP provided.** It is like that it is hardcoded to look at localhost and not overwrite the localhost with the custom IP provided in the URL. look the image for the requested URL of the .zattrs file:.

image

davehorsfall commented 1 year ago

Thanks. This helps understand what is happening.

The page at 'https://webatlas.cog.sanger.ac.uk/latest/index.html?theme=dark&config=http://IP:3000/visium-breast-cancer-config.json' was loaded over HTTPS, but requested an insecure resource 'http://ip:3000/visium-breast-cancer-config.json'. This request has been blocked; the content must be served over HTTPS.

As you've already determined, your browser won't like if you try to load insecure content from a secure page. When it is localhost, your browser won't mind too much, but when you switch to your public IP then the browser will complain.

ok, i thought i will set up an SSL for http-server and serve with https instead

Yes, serving over secure https will help fix that problem.

Great, the https works..!

Good job!

Strangely , the webatlas app in sanger picks the *.json correclty from the IP:3000 but when it tries to load the rest of the files (zattrs, zmetadata etc), it looks for them in the localhost and not in the IP provided.

What's happening here, is that the "localhost" address is hardcoded in the visium-breast-cancer-config.json file. When you run the pipeline (which generates the config files) you specify the URL of the data. For more information about this, please see the url parameter in the docs here: https://haniffalab.com/webatlas-pipeline/configuration.html#dataset

To correct this, you can update your configuration file for the Nextflow pipeline, and run it again, so the visium-breast-cancer-config.json file includes the correct url. It will be something like this:

projects:
  - project: visium
    datasets:
      - dataset: breast-cancer
        title: "Visium CytAssist - High resolution mapping of the breast cancer tumor microenvironment"
        url: 'https://IP:3000'
        data:
          -
            data_type: spaceranger
            data_path: ./input/CytAssist_FFPE_Human_Breast_Cancer/
          -
            data_type: raw_image
            data_path: ./input/CytAssist_FFPE_Human_Breast_Cancer/tissue_image.tif

I've just realised this isn't well documented in the example, and the sample CytAssist_FFPE_Human_Breast_Cancer.yaml file doesn't include the url parameter. In this case, it just uses localhost, which is why it works, but I will update the docs to make all this clearer for users. I've opened #117 to action this.

Please let me know if the above helps. Thanks.

armaos commented 1 year ago

Helo Dave! Neat answer, thank you much. That was the trick!!

Just to add one more. You can mention in the documentation that https can be enabled though http-server by setting up the certificate and adding the corresponding parameter in the command -C cert.pem: npx http-server output/CytAssist_FFPE_Human_Breast_Cancer/0.3.2/ --port 3000 --cors -S -C cert.pem

Brilliant!. I close this.