poloclub / wizmap

Explore and interpret large embeddings in your browser with interactive visualization! 📍
https://poloclub.github.io/wizmap/
MIT License
410 stars 29 forks source link

Example with localhost URLs #2

Closed WhiteTeaDragon closed 1 year ago

WhiteTeaDragon commented 1 year ago

Hello!

I have a small request: could you please give an example of accessing files via localhost-URLs? I have a notebook running on localhost:8888, from within the "wizmap" folder. I have tried to use the following URLs for my data:

http://localhost:8888/view/example/smaller_data.ndjson file:///Users/alexsend57/Documents/wizmap/example/smaller_data.ndjson

However, in both cases the visualisation does not work (points are not appearing). I have checked that there is no problem with the data itself by uploading it on GitHub and running the visualisation with public URLs -- it worked perfectly.

xiaohk commented 1 year ago

Good question! I will update the README to add more examples. If you are using Jupyter server to host files, you need to use their special URL to get to a file. For example, ./test.txt will have a localhost url of http://localhost:8888/files/test.txt (see the video below).

https://github.com/poloclub/wizmap/assets/15007159/6be1d4c7-32e0-41d7-96dd-006ee43d2d9e

If you are using other web servers (e.g., python3 -m http.server), then the file should be accessible directly at http://localhost:8000/test.txt.

WhiteTeaDragon commented 1 year ago

Thank you! I am starting my jupyter notebook with python3 -m notebook. I can download the file manually from my browser via the link http://localhost:8888/view/example/smaller_data.ndjson. However, when I try to download it from inside the notebook via requests.get('http://localhost:8888/view/example/smaller_data.ndjson'), it does not work -- probably, for the same reason as visualisation. I also tried to remove 'view' from the path, but it does not work as well. Do you have any advice?

xiaohk commented 1 year ago

Can you try http://localhost:8888/files/example/smaller_data.ndjson?

WhiteTeaDragon commented 1 year ago

Yes, I've tried it. It does not work either.

WhiteTeaDragon commented 1 year ago

I have restarted my jupyter notebook without token, thinking it might be the problem. However, it did not help.

WhiteTeaDragon commented 1 year ago

Actually, when I use 'files', I get the error 403-Forbidden.

xiaohk commented 1 year ago

Got it. How about http://localhost:8888/files/example/smaller_data.ndjson?token=xxxxxxx, where the xxxxx is your Jupyter token from the terminal.

WhiteTeaDragon commented 1 year ago

I have removed the token already. Currently, the setup is like this:

image
WhiteTeaDragon commented 1 year ago

After I restarted the notebook with option --NotebookApp.disable_check_xsrf=True, the link with 'files' started working. Thank you for your help!

xiaohk commented 1 year ago

Thanks for sharing your solution! It will be very helpful for other users as well!

I guess by default, Jupyter users need to also copy the xsrf string in their url. They can get the string by right click the file and select copy download link in Jupyter Lab.

Chao0511 commented 1 year ago

Hello,

I'm using jupyter notebook on a remote server. Inside the notebook, I can download via 1) requests.get('http://10.206.215.206:8002/data.ndjson') 2) urllib.request.urlretrieve('http://10.206.215.206:8002/data.ndjson', "test.ndjson"). In a browser, I can download manually via 'http://10.206.215.206:8002/data.ndjson'. However, I still get 0 data points in wizmap.visualize. Would you have any advice on how to run wizmap on a remote server? thank you.

Chao0511 commented 1 year ago

p.s. regarding to your example files 'https://huggingface.co/datasets/xiaohk/embeddings/resolve/main/imdb/data.ndjson' and 'https://huggingface.co/datasets/xiaohk/embeddings/resolve/main/imdb/grid.json', when I download them and then put them in my github: 'https://github.com/Chao0511/data/blob/main/data_hug.ndjson' and 'https://github.com/Chao0511/data/blob/main/grid.json' , wizmap.visualize shows 0 points, too. Did I do anything wrong? I didn't modify any file. thank you.

WhiteTeaDragon commented 1 year ago

If you are using GitHub, you should use links to the raw data, e.g. https://raw.githubusercontent.com/Chao0511/data/main/data_hug.ndjson

WhiteTeaDragon commented 1 year ago

If we are talking about a = requests.get('http://10.206.215.206:8002/data.ndjson') — have you looked at a.content? It should be not the HTML file, but the raw data. Is it the case?

Chao0511 commented 1 year ago

If we are talking about a = requests.get('http://10.206.215.206:8002/data.ndjson') — have you looked at a.content? It should be not the HTML file, but the raw data. Is it the case?

yes. a.content gives me:

b'[6.1994218826293945, 5.6783976554870605, "1"]\n[4.8066792488098145, 2.9193058013916016, "5"]\n[6.91471004486084, 2.46600079536438, "3"]\n[4.565587520599365, 2.32009220123291, "13"]\n[2.6964783668518066, 1.7311012744903564, "4"]\n[6.540163040161133, 3.6321873664855957, "13"]\n[7.043384075164795, 3.2535653114318848, "2"]\n[2.1697118282318115, 1.8707449436187744, "4"]\n[4.935248374938965, 2.7773969173431396, "5"]\n[3........

Chao0511 commented 1 year ago

Thank you, but it still doesn't work.

data_url = 'https://raw.githubusercontent.com/Chao0511/data/blob/main/data_hug.ndjson' grid_url = 'https://raw.githubusercontent.com/Chao0511/data/blob/main/grid_hug.json' wizmap.visualize(data_url, grid_url, height=700)

0 data point

WhiteTeaDragon commented 1 year ago

And does it work with the original links from huggingface?

Chao0511 commented 1 year ago

Yes the original huggingface works perfectly ---- Replied Message ---- | From | Alexandra @.> | | Date | 06/23/2023 18:51 | | To | poloclub/wizmap @.> | | Cc | Chao0511 @.>, Comment @.> | | Subject | Re: [poloclub/wizmap] Example with localhost URLs (Issue #2) |

And does it work with the original links from huggingface?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xiaohk commented 1 year ago

Thank you, but it still doesn't work.

data_url = 'https://raw.githubusercontent.com/Chao0511/data/blob/main/data_hug.ndjson' grid_url = 'https://raw.githubusercontent.com/Chao0511/data/blob/main/grid_hug.json' wizmap.visualize(data_url, grid_url, height=700)

0 data point

Hi @Chao0511, you can also use the web app at https://poloclub.github.io/wizmap to load these json files by clicking the file button on the bottom right.

I added these two URLs, and it seems working:

image

Data: https://raw.githubusercontent.com/Chao0511/data/main/data_hug.ndjson Grid: https://raw.githubusercontent.com/Chao0511/data/main/grid_hug.json

The sharable link is: https://poloclub.github.io/wizmap/?dataURL=https%3A%2F%2Fraw.githubusercontent.com%2FChao0511%2Fdata%2Fmain%2Fdata_hug.ndjson&gridURL=https%3A%2F%2Fraw.githubusercontent.com%2FChao0511%2Fdata%2Fmain%2Fgrid_hug.json

xiaohk commented 1 year ago

Hello,

I'm using jupyter notebook on a remote server. Inside the notebook, I can download via 1) requests.get('http://10.206.215.206:8002/data.ndjson') 2) urllib.request.urlretrieve('http://10.206.215.206:8002/data.ndjson', "test.ndjson"). In a browser, I can download manually via 'http://10.206.215.206:8002/data.ndjson'. However, I still get 0 data points in wizmap.visualize. Would you have any advice on how to run wizmap on a remote server? thank you.

Did you see any error in the web console? Make sure you are feeding both data.ndjson and grid.json

Chao0511 commented 1 year ago
  1. when running with github files in the web console, both the hugging face files and my generated files work, thank you!
  2. on my remote server, when run npm run dev, I got error: wizmap/node_modules/vite/bin/vite.js:2 import { performance } from 'node:perf_hooks' ^

SyntaxError: Unexpected token { at Module._compile (internal/modules/cjs/loader.js:723:23) at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12) at Function.Module._load (internal/modules/cjs/loader.js:585:3) at Function.Module.runMain (internal/modules/cjs/loader.js:831:12) at startup (internal/bootstrap/node.js:283:19) at bootstrapNodeJSCore (internal/bootstrap/node.js:623:3) npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! wizmap@0.1.1 dev: vite --port 3000 npm ERR! Exit status 1 npm ERR! npm ERR! Failed at the wizmap@0.1.1 dev script. npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in: npm ERR! /.npm/_logs/2023-06-23T17_10_07_691Z-debug.log

so I cannot have a web console on my server.

xiaohk commented 1 year ago

Got it, there are many different ways to use WizMap from a remote server 😆

  1. Host the data from the remote server, host WizMap from the local machine
  2. Host the data from the remote server, use https://poloclub.github.io/wizmap to read data
  3. Host the data from the remote server, use WizMap from a local Jupyter server
  4. Host both the data and WizMap from the remote server
  5. More...

It seems you are trying method 4? Perhaps try to run npm install first before npm run dev?

Chao0511 commented 1 year ago

yes, both wizmap and the data I'm using are in my remote server (i.e. method 4). I did run npm install before running npm run dev. For npm install, it gave me many WARN. e.g. the last lines of output are:

npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for esbuild-windows-arm64@0.15.18: wanted {"os":"win32","arch":"arm64"} (current: {"os":"linux","arch":"x64"}) npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.3.2 (node_modules/fsevents): npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.3.2: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

audited 440 packages in 19.037s

52 packages are looking for funding run npm fund for details

found 1 moderate severity vulnerability run npm audit fix to fix them, or npm audit for details

Does it mean npm install ran successfully? because there's no error.

Then npm run dev gave me the error above.

Thank you. :)

xiaohk commented 1 year ago

I see, what's your node.js version on the remote server? You can run node -v to check.

Perhaps try to update it to the current stable version (v18)?

Related issue: https://github.com/microsoft/JARVIS/issues/131

Chao0511 commented 1 year ago

thank you. I got "nodejs is already the newest version (10.19.0~dfsg-3ubuntu1)."

xiaohk commented 1 year ago

Hmm, does it mean the node version is 10.19? The latest Latest LTS Version is 18.16.1.

Check out this question: https://askubuntu.com/questions/1354638/update-nodejs-from-10-19-0-to-14-17-3-lts-in-20-04

It seems you need run some special instruction to update node on Ubuntu:

Thanks, even after trying the command as root, it says nodejs is already the newest version (10.19.0~dfsg-3ubuntu1).

That just means it's the latest version of nodejs that is available in your ubuntu repo. To update beyond what is included with your version of ubuntu (20.04?), you need to use the above method offered by

Chao0511 commented 1 year ago

hello,

Thank you for your answer.

I updated the node version to v18.16.1. And then I could run npm run dev, its output is:

wizmap@0.1.1 dev vite --port 3000

VITE v3.2.7 ready in 1851 ms

➜ Local: http://localhost:3000/ ➜ Network: use --host to expose 9:39:53 PM [vite-plugin-svelte] /src/components/mapview/MapView.svelte:70:0 Unused CSS selector ".main-app-container.hidden" 9:39:59 PM [vite-plugin-svelte] /src/components/footer/Footer.svelte:250:0 Unused CSS selector "dialog[open] .row-block .row-name" 9:40:04 PM [vite] ✨ new dependencies optimized: flexsearch

And in my browser, I can open the web console which is on the remote server.

However, I cannot visualize point. For a web console on the server and data&grid files also on the server, what is the url I should pass in the web console? Again, I can download files via 1) requests.get('http://10.206.215.206:8002/data.ndjson') 2) urllib.request.urlretrieve('http://10.206.215.206:8002/data.ndjson', "test.ndjson").

Thank you so much.

xiaohk commented 1 year ago

Hmm I think you should be able to see points. You would need to click the folder icon on the bottom right of WizMap => copy http://10.206.215.206:8002/data.ndjson to Data, and http://10.206.215.206:8002/grid.json to Grid => click create.

Can you see contour and embedding summaries in the visualization?

https://github.com/poloclub/wizmap/assets/15007159/2ac5309e-392f-4dd5-8616-d3433e2b4048

Chao0511 commented 1 year ago

Thank you, that's what I'm doing, and I can only see the blank web console, it seems files are not passed.

Besides, I got:

"GET /grid.json HTTP/1.1" 200 -

Exception occurred during processing of request from ('10.48.132.64', 52009) Traceback (most recent call last): File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/socketserver.py", line 683, in process_request_thread self.finish_request(request, client_address) File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 1304, in finish_request self.RequestHandlerClass(request, client_address, self, File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 668, in init super().init(*args, **kwargs) File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/socketserver.py", line 747, in init self.handle() File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 433, in handle self.handle_one_request() File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 421, in handle_one_request method() File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 675, in do_GET self.copyfile(f, self.wfile) File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/http/server.py", line 875, in copyfile shutil.copyfileobj(source, outputfile) File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/shutil.py", line 198, in copyfileobj fdst_write(buf) File "/opt/anaconda/anaconda3/envs/repre/lib/python3.10/socketserver.py", line 826, in write self._sock.sendall(b) BrokenPipeError: [Errno 32] Broken pipe

Chao0511 commented 1 year ago

p.s. my data is very small (100 points), much smaller than your huggingface file, so I suppose data size is not the issue.

xiaohk commented 1 year ago

Did you see any error in the web consoles in your browser? You can right click the web page and click "Inspect" to open web console. If you use chrome, you can see this guide.

Chao0511 commented 1 year ago

I found it was due to CORS. Ref: "Python 3 solution" in https://stackoverflow.com/questions/21956683/enable-access-control-on-simple-http-server

When running the indicated python file to get file url, wizmap can have access to it.

Thank you so much. No more problem. Hope this could help others! :)

xiaohk commented 1 year ago

I found it was due to CORS. Ref: "Python 3 solution" in https://stackoverflow.com/questions/21956683/enable-access-control-on-simple-http-server

When running the indicated python file to get file url, wizmap can have access to it.

Thank you so much. No more problem. Hope this could help others! :)

Ah, I see! Thanks for sharing!

jakanil commented 10 months ago

Thanks for sharing! This is super helpful!