Closed nicholasjhorton closed 3 years ago
At present, the downloaded html files have a header of varying number of lines which is inserted by the wayback machine.
Let's use Jim Albert's paper from volume 8, issue 1 as an example: https://github.com/nicholasjhorton/lost-jse-issues/blob/main/papers/albert_8_1.html
<!-- saved from url=(0104)https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm --> <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><script src="./albert_8_1_files/analytics.js" type="text/javascript"></script> <script type="text/javascript">window.addEventListener('DOMContentLoaded',function(){var v=archive_analytics.values;v.service='wb';v.server_name='wwwb-app12.us.archive.org';v.server_ms=179;archive_analytics.send_pageview({});});</script> <script type="text/javascript" src="./albert_8_1_files/bundle-playback.js" charset="utf-8"></script> <script type="text/javascript" src="./albert_8_1_files/wombat.js" charset="utf-8"></script> <script type="text/javascript"> __wm.init("https://web.archive.org/web"); __wm.wombat("http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm","20130307064027","https://web.archive.org/","web","/_static/", "1362638427"); </script> <link rel="stylesheet" type="text/css" href="./albert_8_1_files/banner-styles.css"> <link rel="stylesheet" type="text/css" href="./albert_8_1_files/iconochive.css"> <!-- End Wayback Rewrite JS Include --> <title>Journal of Statistics Education, V8N1: Albert</title></head> <body bgcolor="#ffffff" data-new-gr-c-s-check-loaded="14.1036.0" data-gr-ext-installed=""><!-- BEGIN WAYBACK TOOLBAR INSERT --> <style type="text/css"> body { margin-top:0 !important; padding-top:0 !important; /*min-width:800px !important;*/ } </style> <script>__wm.rw(0);</script> <div id="wm-ipp-base" lang="en" style="display: block; direction: ltr;"> </div><div id="wm-ipp-print">The Wayback Machine - https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm</div> <div id="donato" style="position:relative;width:100%;"> <div id="donato-base"> <iframe id="donato-if" src="./albert_8_1_files/donate.html" scrolling="no" frameborder="0" style="width:100%; height:100%"> </iframe> </div> </div><script type="text/javascript"> __wm.bt(650,27,25,2,"web","http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm","20130307064027",1996,"/_static/",["/_static/css/banner-styles.css?v=omkqRugM","/_static/css/iconochive.css?v=qtvMKcIJ"], "False"); __wm.rw(1); </script> <!-- END WAYBACK TOOLBAR INSERT --> <h1>Using a Sample Survey Project to Assess the Teaching of Statistical Inference</h1>
I think that it would be preferable to remove most of these lines.
The first line is pretty useful as it documents where we got this from if someone wants to go spelunking:
<!-- saved from url=(0104)https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm -->
Buried later in there is a line:
<title>Journal of Statistics Education, V8N1: Albert</title></head>
We want to keep this.
Lastly, we want to keep the title (formatted as h1) of the paper.
h1
<h1>Using a Sample Survey Project to Assess the Teaching of Statistical Inference</h1>
Finished editing the wayback machine headers in HTML files for volume 8
Thanks! This looks great.
At present, the downloaded html files have a header of varying number of lines which is inserted by the wayback machine.
Let's use Jim Albert's paper from volume 8, issue 1 as an example: https://github.com/nicholasjhorton/lost-jse-issues/blob/main/papers/albert_8_1.html
I think that it would be preferable to remove most of these lines.
The first line is pretty useful as it documents where we got this from if someone wants to go spelunking:
Buried later in there is a line:
We want to keep this.
Lastly, we want to keep the title (formatted as
h1
) of the paper.