this is an interesting topic, however, I have a few things to note:
The data you work is very large, given the time permitted, can you explore all the areas on arxiv, should we restrict our problems to certain fields like computer science? etc?
I haven't looked at the data yet but will there be problems like when you click on an author, you will see that arxiv returns publications by other authors with similar names as well. Will it happen when you crawl the data? Or you can retrieve the exact email/ affiliations associated with that author? (so data issues, you should kick off the crawling early to see if there are any potential problems?)
How are you going to process the text data, such information should be clearly explained in your proposal as well.
For the plan, I think you should explain who are member 1 2 and 3 :))
We can also be more specific on the possible technology/ library that you will use to achieve your goal
this is an interesting topic, however, I have a few things to note: