scrapinghub / js2xml

Convert Javascript code to an XML document
MIT License
186 stars 23 forks source link

Syntax Error - Unexpected '>' #50

Open flywire opened 3 years ago

flywire commented 3 years ago

Code in site works fine, seems to be a processing error: _raise_syntax_error raise ECMASyntaxError(msg[len(tokens)].format(*tokens)) calmjs.parse.exceptions.ECMASyntaxError: Unexpected '>' at 47:133 between '=' at 47:132 and '{' at 47:135

Runs without processing code:

from bs4 import BeautifulSoup  
import requests
import re    
import js2xml    
from itertools import repeat    
from pprint import pprint as pp

url = "https://www.worldweatheronline.com/canberra-weather-averages/australian-capital-territory/au.aspx"
url = "https://www.investsmart.com.au/invest-with-us/investsmart-growth-portfolio"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
script = soup.find("script", text=re.compile("Highcharts.Chart")).string
# parsed = js2xml.parse(script)

# print(js2xml.pretty_print(parsed))
print(script)

"""
data = [d.xpath(".//array/number/@value") for d in parsed.xpath("//property[@name='data']")]
categories = parsed.xpath("//property[@name='categories']//string/text()")
output =  list(zip(repeat(categories), data))    
print(pp(output))
"""

Error occurs:

from bs4 import BeautifulSoup  
import requests
import re    
import js2xml    
from itertools import repeat    
from pprint import pprint as pp

url = "https://www.worldweatheronline.com/canberra-weather-averages/australian-capital-territory/au.aspx"
url = "https://www.investsmart.com.au/invest-with-us/investsmart-growth-portfolio"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
script = soup.find("script", text=re.compile("Highcharts.Chart")).string
parsed = js2xml.parse(script)

print(js2xml.pretty_print(parsed))
# print(script)
from bs4 import BeautifulSoup  
import requests
import re    
import js2xml    
from itertools import repeat    
from pprint import pprint as pp

url = "https://www.worldweatheronline.com/canberra-weather-averages/australian-capital-territory/au.aspx"
url = "https://www.investsmart.com.au/invest-with-us/investsmart-growth-portfolio"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
script = soup.find("script", text=re.compile("Highcharts.Chart")).string
parsed = js2xml.parse(script)

print(js2xml.pretty_print(parsed))
# print(script)

data = [d.xpath(".//array/number/@value") for d in parsed.xpath("//property[@name='data']")]
categories = parsed.xpath("//property[@name='categories']//string/text()")
output =  list(zip(repeat(categories), data))    
print(pp(output))
flywire commented 3 years ago

js2xml uses calmjs library for parsing which is only meant for Node.js. Node isn't meant to support let (and probably arrow functions too).

So maybe can't be supported.