jkirk / coronavirus-infos

Track changes of https://www.sozialministerium.at/Informationen-zum-Coronavirus
MIT License
0 stars 1 forks source link

Make all HREF absolute #1

Closed 360path closed 4 years ago

360path commented 4 years ago

Make all href absolute (referring to www.sozialministerium.at) so that the versioned pages can be viewed directly.

Easier: create a base tag in the head:

<base href="https://www.sozialministerium.at/">
jkirk commented 4 years ago

I need a way automate it.

jkirk commented 4 years ago

I was thinking about a few options here:

360path commented 4 years ago

Keep it simple. I would download the pages' html into a source folder. From there the script could regex find <head ...> </head> and insert the base tag just after the opening and save the output as a generated file where the current pages are. I'm not proficient in shell scripting. But maybe this helps? https://community.idera.com/database-tools/powershell/ask_the_experts/f/learn_powershell_from_don_jones-24/17942/add-html-to-an-existing-web-page

jkirk commented 4 years ago

Adding the line is easy. Something like this would do the trick (changes the files 'in place'):

% sed -i '/<head>/a<base href="https://www.sozialministerium.at/">' Neuartiges-Coronavirus-\(2019-nCov\).html
% sed -i '/<head>/a<base href="https://www.sozialministerium.at/">' Coronavirus---Haeufig-gestellte-Fragen.html

I think it would be the easiest to just apply the change to the downloaded files and avoid tracking "source" and "generated" files. Any objections?

360path commented 4 years ago

No objections. I was just referring to your ideas:

  • creating a separate branch?

    • something like base-href where the patched files reside
    • something like upstream where the downloaded files are kept untouched
  • some kind of Makefile which patches the downloaded files if you want to view them
360path commented 4 years ago

Adding the line is easy. Something like this would do the trick (changes the files 'in place'):

% sed -i '/<head>/a<base href="https://www.sozialministerium.at/">' Neuartiges-Coronavirus-\(2019-nCov\).html
% sed -i '/<head>/a<base href="https://www.sozialministerium.at/">' Coronavirus---Haeufig-gestellte-Fragen.html

I think it would be the easiest to just apply the change to the downloaded files and avoid tracking "source" and "generated" files. Any objections?

Didn't have a look at the source. But NB: the <head> tag could have attributes, then this wouldn't work anymore?

jkirk commented 4 years ago

Didn't have a look at the source. But NB: the <head> tag could have attributes, then this wouldn't work anymore?

Right. But currently it doesn't. If that changes in the future, the script needs to be updated.

360path commented 4 years ago

Yes. But in case that happens, the script will break, I guess. I suppose it's not a priority.