trevorcampbell / website_diff

MIT License
2 stars 1 forks source link

"No such file" when file seemingly exists #28

Open joelostblom opened 2 days ago

joelostblom commented 2 days ago

I am trying to use website_diff for reviewing https://github.com/vega/altair/pull/3544, which I am building locally and diffing against main. However, you can reproduce by downloading https://github.com/altair-viz/altair-viz.github.io/archive/master.zip and then extracting it into two folders: new and old, before running:

website_diff --old old/ --new new/ --diff diff

I tried it both with the exact same site and with making some changes in new. Both lead to the following error message being raised:

2024-09-17 16:40:44.044 | INFO     | website_diff.crawler:crawl:25 - Crawling prerendered_old/user_guide/data_transformers.html
2024-09-17 16:40:44.074 | INFO     | website_diff.crawler:crawl:25 - Crawling prerendered_old/user_guide/large_datasets.html
2024-09-17 16:40:44.126 | INFO     | website_diff.crawler:crawl:25 - Crawling /_static/chart.html
Traceback (most recent call last):
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/cli.py", line 53, in main
    wd.render.prerender.prerender(old,new,diff,selector,index)
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/render/prerender.py", line 11, in prerender
    old_pages = wd.crawler.crawl(os.path.join(old, index), set(), selector)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/crawler.py", line 57, in crawl
    crawl(os.path.join(curdir, ref), gathered, content_selector, crawled)
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/crawler.py", line 57, in crawl
    crawl(os.path.join(curdir, ref), gathered, content_selector, crawled)
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/crawler.py", line 57, in crawl
    crawl(os.path.join(curdir, ref), gathered, content_selector, crawled)
  [Previous line repeated 56 more times]
  File "/home/joel/miniconda3/envs/website_diff/lib/python3.12/site-packages/website_diff/crawler.py", line 28, in crawl
    with open(filepath, 'r') as f:
         ^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/_static/chart.html'

Cleaning up directory diff

It does seem like both directories have a file in that location:

image

All the other files crawled are in prerendered_old; not sure why _static is not?

trevorcampbell commented 1 day ago

Interesting, thanks for posting the issue @joelostblom ! We'll look into it asap

trevorcampbell commented 1 day ago

I think the problem is that it's using an absolute path:

FileNotFoundError: [Errno 2] No such file or directory: '/_static/chart.html'

But what I see in a lot of the html files here is ../_static/chart.html:

large_datasets.html:  <script src="../_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
large_datasets.html:    <script src="../_static/documentation_options.js?v=d2f74b0e"></script>
large_datasets.html:    <script src="../_static/doctools.js?v=9a2dae69"></script>
large_datasets.html:    <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
large_datasets.html:    <script src="../_static/clipboard.min.js?v=a7894cd8"></script>
large_datasets.html:    <script src="../_static/copybutton.js?v=f281be69"></script>
large_datasets.html:    <script src="../_static/design-tabs.js?v=f930bc37"></script>

I wonder if somehow we've got a bug in filename parsing that breaks on ../