Open mrT23 opened 1 month ago
Steps to add a new language to multilspy:
At step 5, you may need the complete server logs to debug. You can pass a logger to the init method for the same.
Please feel free to reach out to me if you face any issues.
Thanks a lot for the feedback ! will try
@mrT23 If you end up getting Javascript support working on your own fork, I'd love to see as I will be needing the same in the coming weeks!
@LakshyAAAgrawal And thank you for the excellent repo, it has saved me a lot of time with a system that requires accurately parsing code references.
One of the primary intentions I had with this repository was for it to be reference for the community to implement and use clients for various language servers. While most language server clients target an IDE like vscode, multilspy is intended to be a repository for programmatic uses of language servers, as opposed to user-facing usecases like in vscode.
I would be really glad if you could contribute your implementation of a js client!
@LakshyAAAgrawal I ended up implementing a JS/TS client with the typescript-language-server
package about a month ago. It's actually being used in production now, but I definitely didn't follow your procedures haha. Will follow your guide when we implement Golang and PHP next!
@mrT23 @imanewman This is our implementation, please lmk if you find this useful for your implementation(s) in any way. Should work out of the box, but you'll need NPM + Typescript + the language server installed locally.
@themichaelusa I checked out the repository, and it looks well implemented!
Would it be okay with you and your team to create a PR and add it to multilspy so that the wider community could make use of it?
Yes absolutely @LakshyAAAgrawal! Our intention was always to merge our work into multilspy. It has served us very well so far and we'd like to give back to the community. I will open it later today once I fill out the Microsoft CLA.
@themichaelusa Thanks for sharing the code.
i cloned the draft PR, and did some QA. on some repos it worked, on others, it got stuck. I saw this happens for an inner code of us, but also for public repos. If I am doing something wrong, do let me know
Here is a reproducible example:
1) clone https://github.com/gvergnaud/ts-pattern
2) run:
import asyncio
from multilspy import SyncLanguageServer, LanguageServer
from multilspy.multilspy_config import MultilspyConfig
from multilspy.multilspy_logger import MultilspyLogger
async def run():
config = MultilspyConfig.from_dict({"code_language": "typescript"}) # Also supports "python", "rust", "csharp"
logger = MultilspyLogger()
repo = "local_path_to_replace/ts-pattern"
rel_file = "src/match.ts"
print("Starting server...")
lsp = LanguageServer.create(config, logger, repo)
print("Server started!")
async with lsp.start_server():
print("request_document_symbols...")
result = await lsp.request_document_symbols(
rel_file, # Filename of location where request is being made
)
print("request_document_symbols done!")
print(result)
if __name__ == '__main__':
asyncio.run(run())
it gets stuck here
interestingly, when I run only on the src
directory:
repo = "local_path/ts-pattern/src"
rel_file = "match.ts"
it works, so I think the LSP might have problems with repos too large, or with some specific files or extensions present somewhere in the repo.
@mrT23 yeah fairly likely that if node_modules
isn't being included when you index ts-pattern
- it may be related to that event loop fix that I reverted. Or you aren't filtering out non-JS/TS files.
But I'm not sure of the exact solution yet, it's pretty late for me right now. Will investigate tomorrow morning. I think I have some code that can help you with that filtering step too.
@mrT23 Haven't had time to test your code out yet - mainly because I've actually been using the synchronous LSP client in multilspy, so kinda unfamiliar with the async behavior.
But here's some helper functions I wrote that essentially prunes out all non-relevant file types for a given repo. Also works on multi-lingual monorepos, and splits them into unique trees per language group e.g python
that still preserve their structure so multilspy can still work over it.
EXT_TO_LANGUAGE_DATA
is essentially just a dictionary with this form.
{
".py": {
"is_code": true,
"language_mode": "python"
},
...
}
LANGUAGE_TO_LSP_LANGUAGE_MAP = {
"python": "python",
"javascript": "typescript",
"typescript": "typescript",
"java": "java",
"rust": "rust",
"csharp": "csharp"
}
def get_all_paths_from_root_relative(root_path):
abs_paths, rel_paths = [], []
for root, dirs, files in os.walk(root_path):
for file in files:
abs_path = os.path.join(root, file)
relpath = os.path.relpath(abs_path, root_path)
abs_paths.append(abs_path)
rel_paths.append(relpath)
return abs_paths, rel_paths
def get_language_from_ext(path):
root, ext = os.path.splitext(path)
language_info = EXT_TO_LANGUAGE_DATA.get(ext, {})
is_code = language_info.get("is_code", False)
language = language_info.get("language_mode", None)
lsp_language = LANGUAGE_TO_LSP_LANGUAGE_MAP.get(language, None)
return lsp_language, language, is_code
def copy_and_split_root_by_language_group(abs_root_path):
abs_paths, _ = get_all_paths_from_root_relative(abs_root_path)
languages = set()
for p in abs_paths:
lsp_language, language, is_code = get_language_from_ext(p)
if is_code:
languages.add(lsp_language)
languages = [l for l in languages if l]
num_root_copies = len(languages)
copy_paths = []
# copy the root directory num_root_copies times into /tmp/callgraph_root_copies/{random_hash}
for _ in range(num_root_copies):
random_hash = str(uuid.uuid4()).split('-')[0]
root_copy_path = os.path.join(TMP_DIR_PARENT, random_hash)
shutil.copytree(abs_root_path, root_copy_path)
copy_paths.append(root_copy_path)
for copy_path, language in zip(copy_paths, languages):
for root, dirs, files in os.walk(copy_path):
for file in files:
file_language, _, is_code = get_language_from_ext(file)
if file_language == language and is_code:
continue
else:
os.remove(os.path.join(root, file))
# remove copy_paths that only have directories and no files
nonempty_copy_paths = []
for copy_path, language in zip(copy_paths, languages):
files_set = set()
for root, dirs, files in os.walk(copy_path):
for file in files:
files_set.add(file)
if not files_set:
print(f"copy_path: {copy_path} is empty")
shutil.rmtree(copy_path)
continue
nonempty_copy_paths.append((copy_path, language))
return nonempty_copy_paths
Hope this helps you for now. I'll look at your example at night. Will also discuss w/ @LakshyAAAgrawal if this automated approach for filtering out language specific subtrees is appropriate for inclusion in multilspy with a follow up PR.
The problem occurs both with sync and async
For each repo, copying all the relevant files to a cloned repo seems to me like a non optimal operation. I think we should be able to tell the LSP which file types to take, instead of doing by default cloning
I generally agree that cloning isn't perfect. Luckily this is code that runs outside of the package.
I think we should be able to tell the LSP which file types to take, instead of doing by default cloning
I do wonder if this is possible in initialize_params.json
e.g programmatically excluding certain subpaths or file extensions.
Will look into it after this PR is merged.
@LakshyAAAgrawal my PR #6 is ready for your review. Excited to get this in!
Hi, and thanks for the excellent repo.
Can you share some tips on the process of adding a new language ?
For example, if I want to add support for javascript, where should I start ?