rebane2001 / matterport-dl

A downloader for matterport virtual tours
The Unlicense
311 stars 78 forks source link

"AttributeError: 'NoneType' object has no attribute 'group' #3

Closed tirksam closed 2 years ago

tirksam commented 3 years ago

Hi,

Thanks for the nice tool. I encountered following error when I tried to download data from certain property. I found a following message from Reddit with the example below:

"AttributeError: 'NoneType' object has no attribute 'group'" error. Here's an example URL of a random listing I found that has the error: https://my.matterport.com/show/?m=AgXyHBR6VCp

Would you have a solution to this problem as someone mentioned on the topic?

tirksam commented 3 years ago

py E:/Python/matterport-dl-main/matterport-dl.py "https://my.matterport.com/show/?m=JP9YgC9agCW"

Here is the error message

Downloading base page... Traceback (most recent call last): File "E:\Python\matterport-dl-main\matterport-dl.py", line 153, in initiateDownload(sys.argv[1]) File "E:\Python\matterport-dl-main\matterport-dl.py", line 149, in initiateDownload downloadPage(url.split("m=")[-1].split("&")[0]) File "E:\Python\matterport-dl-main\matterport-dl.py", line 130, in downloadPage accessurl = re.search(r'"(https://cdn-1.matterport.com/models/.*?/assets/~/{{filename}}\?t=.*?)"', r.text).group(1).replace("{{","{").replace("}}","}") AttributeError: 'NoneType' object has no attribute 'group'

bobo-jamson commented 3 years ago

I have fixed the code causing this issue, but now I am encountering a 401 error, I don't know if the error was caused by me and my fix or if Matterport has become wise to our ways.

Here's my fix:

in the downloadPage function definition you need to replace the line defining accessurl with accessurl = re.search(r"\"(https://cdn-1.matterport.com/models/\w+/assets/~/\w+/[\w\<\>.]+\?t=[\w\-]+)\"", r.text).group(1)

so the entire function with the change would look like :

def downloadPage(pageid):
    makeDirs(pageid)
    os.chdir(pageid)
    print("Downloading base page...")
    r = requests.get(f"https://my.matterport.com/show/?m={pageid}")
    staticbase = re.search(r'<base href="(https://static.matterport.com/.*?)">', r.text).group(1)
    ## regex fixed but unsure of downstream consequences -bobo jamson
    accessurl = re.search(r"\"(https://cdn-1.matterport.com/models/\w+/assets/~/\w+/[\w\<\>.]+\?t=[\w\-]+)\"", r.text).group(1)
    # Automatic redirect if GET param isn't correct
    injectedjs = 'if (window.location.search != "?m=' + pageid + '") { document.location.search = "?m=' + pageid + '"; }'
    content = r.text.replace(staticbase,".").replace("https://cdn-1.matterport.com","").replace("https://mp-app-prod.global.ssl.fastly.net","").replace("window.MP_PREFETCHED_MODELDATA",f"{injectedjs};window.MP_PREFETCHED_MODELDATA")
    with open("index.html", "w", encoding="UTF-8") as f:
        f.write(content)
    print("Downloading static assets...")
    downloadAssets(staticbase)
    # Patch showcase.js to fix expiration issue
    patchShowcase()
    print("Downloading model info...")
    downloadInfo(pageid)
    print("Downloading images...")
    downloadPics(pageid)
    print("Downloading model...")
    downloadModel(pageid,accessurl)
    print("Done!")

I'll submit a PR @rebane2001 would like me to.

rebane2001 commented 3 years ago

@bobo-jamson This is not about the accessurl, there are a few new POST request that need to be archived and emulated somehow.

bobo-jamson commented 3 years ago

@rebane2001 you are referring to the 401 errors being encountered now?

rebane2001 commented 3 years ago

Nope, the 401 errors are something else that I fixed earlier and then reverted

mattchinnock commented 3 years ago

To be clear, the accessurl currently in main is correct and does not need to be updated, but something else needs to be done?