Open dpordomingo opened 5 years ago
This is a known problem in bblfsh:
We use supported languages API to provide meaningful error to the user.
So the question is: do we want to implement workaround inside engine specially for C#?
//cc @creachadair maybe you have some ETA for the proper fix on bblfsh side?
My understanding was that the fix from https://github.com/bblfsh/bblfshd/pull/251 was supposed to address this without requiring a change to Engine. To really fix this we will need a solution for aliasing, for which we don't have a timeline yet (@bzz has started an RFC for the question, but we have had higher-priority work so far).
Is the above workaround not working for you?
it does not. We use enry + bblfsh supported languages api to check that bblfsh is able to parse a file. This api currently doesn't return c#
, only csharp
.
it does not. We use enry + bblfsh supported languages api to check that bblfsh is able to parse a file. This api currently doesn't return
c#
, onlycsharp
.
I see. So is the issue not that you're unable to get a parse for the file, but that the list of languages doesn't match the string on the Engine side so that Engine doesn't even ask?
I added this as a known issue to the release notes.
I would propose to return this issue to backlog and wait for the fix from LA team instead of implementing work around on our side due to:
parse
command, not gitbaseparse
command completely from engine@carlosms wdyt?
Makes sense, let's wait for a proper fix, I don't think there is any rush.
It also fails for cpp (c++ is not supported) and bash (shell is not supported).
It seems like C# is then not supported in any way other than language detection for the Engine. I'd consider simply dropping mentions to C# on the release notes for 0.11.
just for the record:
SELECT UAST(f.blob_content, 'csharp') AS uast
FROM refs AS r
NATURAL JOIN commit_files
NATURAL JOIN files AS f
WHERE r.ref_name = 'HEAD' AND f.file_path REGEXP('.*.cs')
It seems like C# is then not supported in any way other than language detection for the Engine. I'd consider simply dropping mentions to C# on the release notes for 0.11.
Dropping mentions is a safe choice. But note that the issue is in the language detection, not the functionality: Specifically the detection labels a C# file as "C#" which doesn't match Babelfish "csharp" and thus the file never makes it to the parsing step.
So this is a real bug, and needs a real fix, but the core functionality is solid. So I think we can demo the feature safely, modulo the need to work around that issue. Unfortunately, while the solution isn't difficult, it touches a bunch of plumbing, so it isn't as easy as patching one layer.
I've tested it a bit more and this works with 0.11.0:
srcd parse uast csharp.cs --lang csharp
So the real limitation is that you can't rely on automatic language detection, but you can still get UAST if you set the language manually (like @smacker said for the SQL queries). I will add this information to the known issues section of the release notes.
I've also tried a quick workaround, in case we think it's worth to release engine again to fix this issue on our side until it is handled by bblfshd.
If we remove this check, these 3 cases would work:
srcd parse uast csharp.cs
srcd parse uast csharp.cs --lang c#
srcd parse uast csharp.cs --lang csharp
why can't we check if one of the aliases of the required language is one of the exposed by bblfsh
?
I mean:
enry
to guess the language of the file,enry
returns c#
,checkSupportedLanguage(availableLangs []languages, requested language)
filter could check if the requested
one, or one of its enry aliases is in the availableLangs
list.Doing so, the filter will be still there, but working properly.
As a workaround, yes that makes sense. Now the question is if we want to do a workaround, or just wait for the proper handling of aliases by bblfshd.
Sorry, I missed something... I don't see why is it a problem on bblfsh
...
I ran bblfsh
container with images:
$ docker run --rm -d --name bblfshd --privileged -p 9999:9432 bblfsh/bblfshd:v2.11.8-drivers
And then I asked for the uast of a c#
file in all possible ways:
$ bblfsh-cli example.cs
$ bblfsh-cli -l csharp example.cs
$ bblfsh-cli -l c# example.cs
all of them worked.
It also worked for c++
$ bblfsh-cli example.cpp
$ bblfsh-cli -l c++ example.cpp
$ bblfsh-cli -l cpp example.cpp
:thinking: not for bash
When I tried asking about a non existent driver language, it properly failed:
$ bblfsh-cli -l cobol example.cobol
couldn't parse example.cobol: rpc error: code = Unknown desc = unexpected error: runtime failure: missing driver for language "cobol"
So my question is:
Does the filter makes sense in source{d} Engine?
bblfsh
if we already know that there is no driver for it -> then yes: it makes sense, and we could enhance it as I explained.bblfsh
, we can drop it, and let bblfsh
say that it could not parse it.for the proper handling of aliases by bblfshd.
sorry for butting in, but just for the context - as noted in https://github.com/bblfsh/bblfshd/issues/263 so far ETA for proper aliases in bblfsh is Q22019.
Meanwhile @dpordomingo suggestion https://github.com/src-d/engine/issues/297#issuecomment-473219312 on applying public enry API LanguageByAliasMap
to the results of protocol.SupportedVersion
before comparing it to enry.GetLanguage
output inside the engine product sounds very safe and reasonable feature.
I would argue that this is different from what has been discussed in this thread before - it's not a workaround specific for C# and it should be working the same way and not require a change later, after introduction of proper aliases support on bblfsh side.
So my question is:
Does the filter makes sense in source{d} Engine?
- If we want to avoid sending files to
bblfsh
if we already know that there is no driver for it -> then yes: it makes sense, and we could enhance it as I explained.- If we don't care about sending files to
bblfsh
, we can drop it, and letbblfsh
say that it could not parse it.
We didn't have this check on the installed drivers, then we added it after the discussion on https://github.com/src-d/engine/pull/162, creating https://github.com/src-d/engine/issues/163 & https://github.com/src-d/engine/pull/186. So yes, the only reason it's there is to show a nicer error message in case the user requests to parse a file for which we don't have a driver. Another possibility discussed in #186 was to send the parse request and then check the error returned, but that required support from https://github.com/bblfsh/client-go/issues/106.
From the user's perspective, it'd be nice if we can avoid doing extra work for queries we know can't succeed. Once we have sorted out the mapping correctly, I think that's a reasonable tactic. But early filtering does force a coupling between the two APIs that—in retrospect—I suppose we didn't think through clearly enough.
In the meantime: I'd like to make sure the user gets an "understandable" result even if we have to do a little extra work (e.g., attempt to parse a file that a human might know won't work). The question for the UI, though, is how to tell the difference between "this failed because I couldn't find a driver for it" (which is what the pre-filter is a proxy for) and "this failed because the driver reported an error".
I should know but do not: Do we already use canonical error codes to distinguish cases like this, which could inform the UI how to give feedback to the user when a parse fails?
Workaround from v0.12 release notes:
Known Issues
297:
srcd parse
does not detect the language automatically for C#, C++, or bash files. For these languages you will need to set--lang
manually. For example:$ srcd parse uast file.cs --lang csharp $ srcd parse uast file.cpp --lang cpp $ srcd parse uast file.bash --lang bash
discovered at https://github.com/src-d/empathy-sessions/issues/37#issuecomment-470045137
Running parse over a
.cs
file is not workingWorkaround
Until this is fixed,
csharp
needs to be passed explicitly to bblfshd. For example: