webmetrics / browsermob-proxy

NOTICE: this project has been forked and is being maintained at https://github.com/lightbody/browsermob-proxy
https://github.com/lightbody/browsermob-proxy
Apache License 2.0
234 stars 773 forks source link

ProxyServer.start() throws String index out of range exception #111

Open hiansh opened 11 years ago

hiansh commented 11 years ago

Here is the stack trace of the exception:

Caused by String index out of range: -1

java.lang.String.substring(String.java:1911) at cz.mallat.uasparser.fileparser.PHPFileParser.loadFile(PHPFileParser.java:65) at cz.mallat.uasparser.fileparser.PHPFileParser.(PHPFileParser.java:29) at cz.mallat.uasparser.UASparser.loadDataFromFile(UASparser.java:237) at cz.mallat.uasparser.CachingOnlineUpdateUASparser.(CachingOnlineUpdateUASparser.java:55) at cz.mallat.uasparser.CachingOnlineUpdateUASparser.(CachingOnlineUpdateUASparser.java:26) at org.browsermob.proxy.http.BrowserMobHttpClient.(BrowserMobHttpClient.java:61) at org.browsermob.proxy.ProxyServer.start(ProxyServer.java:67)

The contents of the userAgentString.properties file in tmp directory is

Mon Oct 08 11:17:40 EDT 2012

currentVersion=20121005-03 lastUpdateCheck=1349709460516

Workaround: Delete the userAgentString.properties file in tmp directory once

davehunt commented 11 years ago

I've had this issue too. Could we catch this exception and proceed without the cached file? Or remove the file and proceed? At the moment this can halt all proxy servers from launching on our CI.

lightbody commented 11 years ago

Strange, I thought I had removed the code that used CachingOnlineUpdateUASparser. It's been a while, but there might be a command line switch or something related to this. I'll have to dig in to it deeper though.

lightbody commented 11 years ago

OK I checked it out more. The last time this came up I added a command line switch (uaCache -> The number of days to cache a database of User-Agent records from http://user-agent-string.info) to help. It won't help with startup issues though, since the cache has to be loaded at least once. Setting it to zero ensures it's only loaded once, but that doesn't help either. I forget why I didn't just take a copy of the UA Cache and bundle it in the jar. I'm betting because it was harder than it sounds :) Anyway, patches welcome, but if you submit them please do it at http://bmp.lightbody.net, where I'm maintaining the code now.

davehunt commented 11 years ago

@lightbody I'm not sure I understand what this code is doing. Could you explain it a little more before I consider if it's something I can help to fix?

lightbody commented 11 years ago

I'm going off memory here so I might be slightly off, but basically we need to parse the User-Agent header in order to classify the browser type in the HAR. I found a Java API that calls a regularly-updated database of User-Agent strings and that's what we're using to make sense of the UAs. But when that service is down, then the proxy has problems. See http://user-agent-string.info/api for more info.

On Jun 3, 2013, at 7:31 AM, Dave Hunt notifications@github.com wrote:

@lightbody I'm not sure I understand what this code is doing. Could you explain it a little more before I consider if it's something I can help to fix?

— Reply to this email directly or view it on GitHub.

davehunt commented 11 years ago

How often does this need updating? Could we perhaps fall back to a cached version when an update fails? Alternatively, could the location of the cached file be unique to each startup of the server? That way the file will never pre-exist, as that's what seemed to cause my issue.

lightbody commented 11 years ago

Rarely. We could probably just use a local database and just update it occasionally, but that would require some more work. I tried it once before and it turned out to be trickier than I expected.

On Jun 4, 2013, at 5:37 AM, Dave Hunt notifications@github.com wrote:

How often does this need updating? Could we perhaps fall back to a cached version when an update fails? Alternatively, could the location of the cached file be unique to each startup of the server? That way the file will never pre-exist, as that's what seemed to cause my issue.

— Reply to this email directly or view it on GitHub.