o-smirnov / fundamentals_of_interferometry

Fundamentals of Radio Interferometry
GNU General Public License v2.0
4 stars 15 forks source link

(Some code to find) Dead Links #41

Open jfunction opened 7 years ago

jfunction commented 7 years ago

I have written a hacky script to look for dead links. Open a console and cd path/to/fundamentals_of_interferometry. Then fire up ipython and copy the below code (ctrl-c):

import json
import os
import re

valid_dirs='''0_Introduction
1_Radio_Science
2_Mathematical_Groundwork
3_Positional_Astronomy
4_Visibility_Space
5_Imaging
6_Deconvolution
7_Observing_Systems
8_Calibration
9_Practical'''.split('\n')

def parse_links(fname):
  links=[]
  dom = json.loads(open(fname).read())
  for cell in dom['cells']:
    if cell['cell_type']=='markdown':
      for line in cell['source']:
        found=re.findall(r'\([^\(^\)]+?\)',line)
        success=[fnd for fnd in found if '.ipynb' in fnd and not ' ' in fnd] # yuk. Sorry
        if success:
          links+=success
  return {l[1:-1] for l in links}

def is_link_exists(directory,link):
  id=None
  if '#' in link:
    url,id=link.split('#',1)
  else:
    url = link
  try:
    f = open(directory+'/'+url)
    if id:
      txt=f.read()
      if not id in txt:
        print "(Wrong ID)  ",
        return False
    return True
  except Exception as e:
    print "(Wrong URL) ",
    return False

d={}
for valid_dir in valid_dirs:
  d[valid_dir]=set()
  for fname in os.listdir(valid_dir):
    if fname.endswith('.ipynb'):
      d[valid_dir]=d[valid_dir].union(parse_links(valid_dir+'/'+fname))
  print '\n***',valid_dir,'***'
  for f in sorted(list(d[valid_dir])):
    if not is_link_exists(valid_dir, f):
      print f

Now go into your ipython console and type %paste

Look for whatever folder you care about, eg if you are on Chapter 2 look for something like this:

*** 2_Mathematical_Groundwork ***
(Wrong URL)  ../3_Positional_Astronomy/3_0_introduction.ipynb
(Wrong ID)   2_2_important_functions.ipynb#math:eq:1_003
(Wrong ID)   2_2_important_functions.ipynb#math:sec:Rectangle_function
(Wrong ID)   2_6_cross_correlation_and_auto_correlation.ipynb#math:sec:eulers_formula
(Wrong URL)  2_8_the_discrete_fourier_transform.ipynb#math:sec:fast_fourier_tranforms
(Wrong URL)  2_8_the_discrete_fourier_transform.ipynb#math:sec:the_discrete_convolution_definition_and_discrete_convolution_theorem
(Wrong URL)  2_8_the_discrete_fourier_transform.ipynb#math:sec:the_discrete_fourier_transform_as_coefficients_for
(Wrong URL)  2_8_the_discrete_fourier_transform.ipynb#math:sec:the_discrete_fourier_transform_definition

So this says that in folder 2_Mathematical_Groundwork there are files with the following bad links. You can cd into that folder and use ack-grep to find them like this: ack-grep "2_2_important_functions.ipynb#math:sec:Rectangle_function" with output:

2_4_the_fourier_transform.ipynb
428:    "Remember the [rectangle function &#10142;](2_2_important_functions.ipynb#math:sec:Rectangle_function) <!--\\ref{math:sec:Rectangle_function}--> $\\Pi(x) = \n",

so now you know where the bad link is. Hopefully this helps.

jfunction commented 7 years ago

Oh - and "Wrong URL" means the file doesn't exist, while "Wrong ID" means the id (the part after the #) doesn't exist (ie, that text doesn't appear in the file).