sourcerer-io / sourcerer-app

🦄 Sourcerer app makes a visual profile from your GitHub and git repositories.
https://sourcerer.io/start
MIT License
6.73k stars 281 forks source link

Support to Org-Mode #492

Open PabloPavan opened 5 years ago

PabloPavan commented 5 years ago

Support for Org-Mode files could be added. Several codes in other languages are written in this format, e.g., Shell, Python, and R.

sergey48k commented 5 years ago

@yaronskaya could you take a look?

yaronskaya commented 5 years ago

Hi @PabloPavan. Could you please give an example of such files?

PabloPavan commented 5 years ago

Sure, the extension of this file is .org. Below an example this file with R code blocks, the code started using the tag #+begin_src R end finished with #+end_src R Another programming language can be used, inclusive in the same org file along with other languages.

This file starts with a head that contains some attributes of the file. The " * " is similar to " # " in Markdown.

This file is written and executed on Emacs!

#+TITLE: Overlap Analysis
#+AUTHOR: Pablo Pavan
#+LATEX_HEADER: \usepackage[margin=2cm,a4paper]{geometry}
#+TAGS: Pablo(P) Jean(J) noexport(n) deprecated(d) success(s) failed(f) pending(p)
#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+SEQ_TODO: TODO(t!) STARTED(s!) WAITING(w!) REVIEW(r!) PENDING(p!) ON-HOLD(o!) | DONE(d!) CANCELLED(c!) DEFERRED(f!) DEPRECATED(x!)
#+STARTUP: overview indent
#+OPTIONS: ^:nil
#+OPTIONS: _:nil
#+PROPERTY: header-args :eval never-export 

* First idea
** Introduction 
Below picture demonstrates our input and desired output. We have 5
intervals (I added one to the original, to cover the holes case).
We want to decompose these intervals (labelled 1..5) into
non-overlapping sub-intervals (labelled A..I). So for example interval
1 [1,5] can be decomposed into sub-intervals A [1,4] and B [4,5].
** R Solution 
First we set up the data. Then we take the unique endpoints and sort
them, creating new intervals from the adjacent numbers.
Finally, we need to join these new sub-intervals to the
originals. This can be used to exclude holes, and also to identify
which sub-intervals are needed to construct which primary
intervals. Here, a binary-search based interval join is used
(foverlaps).
From the many-to-many relations table we can see which interval ids
(intid) match to which sub-interval ids (subid). For example, intid 1
matches to subid A and subid B. Note that sub-interval H is not
present in this table, since it is a hole.
*** Code

#+begin_src R :results output :session :exports both
# setup data and start decomposition
library(data.table)
input <- c(1,5,4,9,6,12,11,17, 18,20)
intervals <- data.table(matrix(input,ncol=2,byrow=TRUE))
print("intervals 1")
intervals
endpoints <- sort(unique(input))
print("endpoints 1 ")
endpoints
decomp <- data.table(matrix(c(head(endpoints, -1), endpoints[-1]), ncol=2))
print("decomp 1")
decomp
# align decomposition to segs
intervals[, intid := seq_len(length(input)/2)]
print("intervals 2")
intervals
decomp[, subid := LETTERS[seq_len(length(endpoints)-1)]]
print("decomp 2")
decomp
setkeyv(decomp, c('V1','V2'))
setkeyv(intervals, c('V1','V2'))
print("decomp")
decomp
print("intevals")
intervals
relations <- foverlaps(decomp, intervals, type='within', nomatch=0)
print("relations 1")
relations
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  require(grid)
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  numPlots = length(plots)
  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }
 if (numPlots==1) {
    print(plots[[1]])
  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}
library(ggplot2)
p1 <- ggplot(intervals, aes(x=V1,xend=V2,y=intid,yend=intid))+geom_segment()+geom_vline(xintercept=endpoints, linetype=3)+xlab('')+ylab('')+ggtitle('Input')
# solution
p2 <- ggplot(relations)+
  geom_segment(aes(x=i.V1, xend=i.V2, y=intid,yend=intid, color=as.factor(subid)))+
  geom_vline(xintercept=endpoints, linetype=3)+
  geom_text(aes(x=(i.V1+i.V2)/2, y=intid+0.2, label=subid), color='black')+
  geom_segment(data=decomp, aes(x=V1, xend=V2, y=0, yend=0, color=as.factor(subid)))+
  geom_text(data=decomp, aes(x=(V1+V2)/2, y=0+0.2, label=subid), color='black')+
  geom_hline(yintercept=0.5)+guides(color='none')+xlab('')+ylab('')+ggtitle('Output')
multiplot(p1,p2)
#+end_src

#+RESULTS:
#+begin_example
data.table 1.10.4.3
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
[1] "intervals 1"
   V1 V2
1:  1  5
2:  4  9
3:  6 12
4: 11 17
5: 18 20
[1] "endpoints 1 "
 [1]  1  4  5  6  9 11 12 17 18 20
[1] "decomp 1"
   V1 V2
1:  1  4
2:  4  5
3:  5  6
4:  6  9
5:  9 11
6: 11 12
7: 12 17
8: 17 18
9: 18 20
[1] "intervals 2"
   V1 V2 intid
1:  1  5     1
2:  4  9     2
3:  6 12     3
4: 11 17     4
5: 18 20     5
[1] "decomp 2"
   V1 V2 subid
1:  1  4     A
2:  4  5     B
3:  5  6     C
4:  6  9     D
5:  9 11     E
6: 11 12     F
7: 12 17     G
8: 17 18     H
9: 18 20     I
#+end_example
yaronskaya commented 5 years ago

@PabloPavan looks like it is sort of documentation similar to README. We currently do not take into account text and documentation files.